Set up question

Posted by FlapableStonk89@reddit | LocalLLaMA | View on Reddit | 3 comments

I’m currently using a combination of Gemini and Claude web chats to help me with my coding project. I understand that this is not the most efficient thing, given I do not want to pay for premium services and have a limited number of messages with each website.

I have already download msty studio and run a couple of models. I find that they work okay for simply straightforward tasks. However if they the error is outside of one or two scripts. The models are not able to help me solve errors.

So I was wondering if anyone has a local set up or alternative web service that I can use which can give me the same quality of coding assistance as these websites without the limited number of messages?

[-]

Sad-Arrival46@reddit

This is exactly the workflow problem I was hitting. Bouncing between Claude and Gemini manually, trying to stay under message limits, and local models only handling the simple stuff.

Two things that might help:

First, if you want to keep using the paid APIs but more efficiently, most providers have API access that's pay-per-token instead of message-limited. Claude API, OpenAI API, and Google's Gemini API all have free tiers or very cheap rates. A simple coding question might cost $0.001 through the API vs burning one of your 20 daily web chat messages.

Second, the routing problem (knowing when local is good enough vs when you need Claude/Gemini) is something I built a tool to handle. It's called Nadiru, an orchestration engine where a Conductor model classifies each request and routes it to the best available model. Simple errors go to your local model for free. Multi-file debugging that needs broader context goes to Claude or GPT through their APIs. You don't decide which model to use, the Conductor learns what works over time.

With API keys for a few providers, I ran 5 requests for $0.007 total because the simple ones routed to free models automatically.

https://github.com/hlk-devs/nadiru-engine

For the immediate problem of local models not handling multi-file errors: that's usually a context window issue. The model can't see enough of your codebase at once. Try feeding the relevant files together in one prompt rather than describing the error in isolation.

[-]

jwpbe@reddit

What graphics card / VRAM do you have? Windows or linux?

[-]

FlapableStonk89@reddit (OP)

Windows, with ~6 GB vram and 32 GB ram. I have a 5 year old nvidia gpu in an equally old msi gaming laptop. I am out right now, but I can find out my specific card in a few hours.