Follow-up: adding Ollama support to my open-source cursor-aware AI app - looking for beta testers with vision-capable local models

Posted by yaboyskales@reddit | LocalLLaMA | View on Reddit | 22 comments

Follow-up to my post from my latest post asking about fast vision-capable local models with reliable tool calling. Got really helpful answers from this sub. Building it out now and need beta testers before the v1.2.0 release next week.

---

Context for those who didn't see the first post:

AIPointer is an open-source desktop overlay (Mac/Win/Linux, MIT, github.com/gonemedia/aipointer). Hold a key or wiggle your mouse, a box pops up next to your cursor, you ask anything about whatever's under the pointer, get an answer. Currently routes through cloud providers (OpenRouter, Anthropic, OpenAI, Gemini). Default UX target: sub-2s time-to-first-token.

---

Based on this sub's recommendations from the earlier thread, I'm implementing Ollama as a first-class built-in provider for v1.2.0.

Initial implementation supports:

Auto-detect on localhost:11434
Model dropdown populated from /api/tags
Vision + text input pipeline (region screenshot routes to vision model)
Tool calling for AIPointer's 10 built-in tools (fetch_url, open_url, search_web, play_music, set_volume, copy_to_clipboard, read_clipboard, launch_app, save_document, reveal_in_finder)
Per-model timeout (uncapped option for large models on slower hardware)
Same config UX as the cloud providers — just point it at Ollama, pick model, done

A note on prior experience

I've built another open-source desktop AI agent (Skales, also solo) which supports 15+ LLM providers including Ollama, LM Studio, KoboldCpp, vLLM, and any OpenAI-compatible endpoint. So the local-inference plumbing isn't new territory for me - the codepath, the tool-call schema handling, the streaming, the fallback logic, all of that I know from running it in production. What's new for AIPointer specifically is the vision + tools combination under a sub-2s TTFT budget. That's where I want real-world numbers from this sub.

What I cannot test alone

I have one M1 Pro and an Intel 2019 MBP. That's a single Apple Silicon data point from 2021 - says nothing about M2/M3/M4, Pro/Max/Ultra tiers, RAM scaling, RTX 3090/4090, AMD inference paths, AppImage on different distros, or Windows + NVIDIA setups. Solo dev, no test lab..

What I'm looking for

Beta testers with any of:

M-series Mac (M1/M2/M3/M4, Pro/Max/Ultra) - measuring TTFT against Gemini 2+3 Flash cloud baseline
RTX 3090, 4090, or 5090 on Windows or Linux - same baseline
AMD GPU on Linux (ROCm) - would love to know if this works at all
16GB-class VRAM cards - checking what's the realistic model ceiling
Mac mini M4 or M4 Pro - fastest consumer Apple Silicon, want to see TTFT

What I'd ask testers to do

Install AIPointer (signed + notarized on Mac, NSIS on Windows, AppImage on Linux)
Point it at your local Ollama, pick a vision model (Qwen2.5-VL, MiniCPM-V, Llama 3.2 Vision, Pixtral, whatever you already have running)
Use it for 30-60 minutes of normal daily stuff - screenshots, region queries, tool calls
Send back: TTFT numbers, model + quant + hardware, what worked, what didn't, any tool-call failures

I'll fold the feedback into the v1.2.0 release notes and credit testers/contributor if you want. If we find that one model + one inference setup consistently delivers sub-2s TTFT with reliable tool calls on consumer hardware, that becomes the recommended default in onboarding.

I'm not building this to compete with anyone. There's a Chrome-locked cursor companion from a big lab making the rounds, but I'd rather have a system-wide open one/open sourced that actually runs locally for people in this sub.

Drop a comment with hardware + model preference and I'll DM build links. Or just grab the v1.1.1 from aipointer.app today and try cloud-mode first while you wait for v1.2.0.

Source: github.com/gonemedia/aipointer (MIT)

[-]

ArugulaAnnual1765@reddit

Why the fuck is this sub allowing ads?!?!?

Garbage slop at that!!!

[-]

FatheredPuma81@reddit

It doesn't actually the mods just aren't around 24/7 to catch it immediately.

[-]

ArugulaAnnual1765@reddit

surprising because you know reddit mods and this sub has 1m+ followers lol

[-]

FatheredPuma81@reddit

New rules 1 week check-in : r/LocalLLaMA

Idk I kind of think it's a good thing. Shows the mods have a life. The messages I've seen from them on various posts feels like they actively try to keep this sub looking decent while being reasonable too.

[-]

yaboyskales@reddit (OP)

¯_(ツ)_/¯

[-]

fligglymcgee@reddit

This thing is now so convoluted with edits and backpedaling that I can’t imagine anyone is going to feel like wading into this project to find out what else you generated without review.

[-]

FatheredPuma81@reddit

So I updated Ollama today to double check. Yep it's still terrible please stop using it.

[-]

colin_colout@reddit

Wow... Sorry you had to experience ol***a

[-]

FatheredPuma81@reddit

Yea I think the only reason I don't uninstall it is because I keep forgetting I have it installed. A cognitive block or something like that.

[-]

yaboyskales@reddit (OP)

fair, the wrapper stuff, weird tags, default quants, all of it... get it totally!

It's more like a UX thing, one button in onboarding, pulls Ollama + a vision model in the background. Same thing I do in Skales (but mostly users use Skales with Custom Endpoints, so yeah, again Big L for Ollama). Unfortunately llama.cpp direct doesn't really have that flow yet.

But yeah, you're not my default user, you are more one of the OG's in this Sub! adding a generic OpenAI-compatible endpoint slot to v1.2.0 makes sense, then you can point it at whatever you actually trust... I will consider this!

What's actually broken for you right now? Tool calls, vision, throughput? Just taking notes..

[-]

colin_colout@reddit

What's a good recipe for split pea soup?

[-]

yaboyskales@reddit (OP)

ha' nice catch 🤣

[-]

colin_colout@reddit

get it totally! A tip to be taken seriously in the future... Just write what you want to say in your own words(no llm) in your native language and use a traditional translation app like Google translate.

Between the low effort original post with qwen2.5 recommendations, and seeing in this reply that you clearly don't use the tools you build for, I can only assume one of two things: * You're outsourcing your thinking completely. * This is a slop app and you don't even use the tools you built it for.

Maybe both are wrong and you're just over-relying on AI translation and didn't notice that it's making you sound asinine...

If you're gonna reply to comments or make edits to your post (or post again), use your real voice and stop letting the llm think for you. You'll get a lot more respect.

..and please just do us all a favor and stop posting slop messages. This is an technical llm subreddit. We know when you are just copying the llm output into the text box.

[-]

TossedSaladMan69@reddit

Llama 3.2 Vision? Qwen2.5-VL in 2026? Get this AI slop out of here. Fuck off

[-]

yaboyskales@reddit (OP)

Thanks for investigating an Sub literally about AI.

For you it's AI slop, for me, as a non-native English speaker, to keep it short, it's a way to ship faster while building releases, building tools that currently over thousands of people (Skales - AI Slop - hitted over 5k downloads two days ago) actually use, and spending time with my family. Thanks for the hint though, I'll clean up the sub for you (:

[-]

TossedSaladMan69@reddit

Unfortunately, no German here. Eng or FR. But this still violates the rules. You’re supposed to explicitly say if you used AI to assist in writing a post if you’re ESL.

[-]

yaboyskales@reddit (OP)

Checked via AI Slop the wiki and the sub rules. No AI/ESL disclosure requirement anywhere. Point me at the line if I'm missing it. Otherwise that's your personal take, not a rule.

Either way, done here.

Solo dev, kid waiting, real users on the other side of this release. Good night. Fight the Slop.

[-]

TossedSaladMan69@reddit

Here you go: https://www.reddit.com/r/LocalLLaMA/s/kkyLeWSVEM

[-]

yaboyskales@reddit (OP)

You were right, thanks for the link to the rule update. Hadn't seen it. Added a disclosure to the top of the post. BIG L on me for missing it.

[-]

yaboyskales@reddit (OP)

Deleting SUB on my end, sorry for the noise.
BIG L on me for missing the rule update 👍

[-]

Trick-Assignment-828@reddit

can it run remote llm? using a mac and having the llm in a linux machine?

[-]

yaboyskales@reddit (OP)

Yes, same use case I just discussed with FatheredPuma81 below. v1.2.0 adds a generic OpenAI-compatible endpoint slot so you can point AIPointer at your Linux box (Ollama remote, llama.cpp server, vLLM, whatever). Mac client, Linux GPU inference server, that exact split.

v1.1.1 is cloud-only for now.