Follow-up: adding Ollama support to my open-source cursor-aware AI app - looking for beta testers with vision-capable local models

Posted by yaboyskales@reddit | LocalLLaMA | View on Reddit | 22 comments

Follow-up to my post from my latest post asking about fast vision-capable local models with reliable tool calling. Got really helpful answers from this sub. Building it out now and need beta testers before the v1.2.0 release next week.

---

Context for those who didn't see the first post:

AIPointer is an open-source desktop overlay (Mac/Win/Linux, MIT, github.com/gonemedia/aipointer). Hold a key or wiggle your mouse, a box pops up next to your cursor, you ask anything about whatever's under the pointer, get an answer. Currently routes through cloud providers (OpenRouter, Anthropic, OpenAI, Gemini). Default UX target: sub-2s time-to-first-token.

---

Based on this sub's recommendations from the earlier thread, I'm implementing Ollama as a first-class built-in provider for v1.2.0.

Initial implementation supports:

A note on prior experience

I've built another open-source desktop AI agent (Skales, also solo) which supports 15+ LLM providers including Ollama, LM Studio, KoboldCpp, vLLM, and any OpenAI-compatible endpoint. So the local-inference plumbing isn't new territory for me - the codepath, the tool-call schema handling, the streaming, the fallback logic, all of that I know from running it in production. What's new for AIPointer specifically is the vision + tools combination under a sub-2s TTFT budget. That's where I want real-world numbers from this sub.

What I cannot test alone

I have one M1 Pro and an Intel 2019 MBP. That's a single Apple Silicon data point from 2021 - says nothing about M2/M3/M4, Pro/Max/Ultra tiers, RAM scaling, RTX 3090/4090, AMD inference paths, AppImage on different distros, or Windows + NVIDIA setups. Solo dev, no test lab..

What I'm looking for

Beta testers with any of:

What I'd ask testers to do

  1. Install AIPointer (signed + notarized on Mac, NSIS on Windows, AppImage on Linux)
  2. Point it at your local Ollama, pick a vision model (Qwen2.5-VL, MiniCPM-V, Llama 3.2 Vision, Pixtral, whatever you already have running)
  3. Use it for 30-60 minutes of normal daily stuff - screenshots, region queries, tool calls
  4. Send back: TTFT numbers, model + quant + hardware, what worked, what didn't, any tool-call failures

I'll fold the feedback into the v1.2.0 release notes and credit testers/contributor if you want. If we find that one model + one inference setup consistently delivers sub-2s TTFT with reliable tool calls on consumer hardware, that becomes the recommended default in onboarding.

I'm not building this to compete with anyone. There's a Chrome-locked cursor companion from a big lab making the rounds, but I'd rather have a system-wide open one/open sourced that actually runs locally for people in this sub.

Drop a comment with hardware + model preference and I'll DM build links. Or just grab the v1.1.1 from aipointer.app today and try cloud-mode first while you wait for v1.2.0.

Source: github.com/gonemedia/aipointer (MIT)