VSCode and agent integration

Posted by loudsound-org@reddit | LocalLLaMA | View on Reddit | 2 comments

I've been using VSCode with Github Copilot for a bit (free tier) and looking to try running locally due to running in to all of the limits with GHCP. I'd like to have as close of an experience as possible with both code autocomplete and chat integration. I know that GHCP can use local models but I think I'll still run in to session limits and such. If there's a way around that then maybe sticking with it would be best.

A few things about my setup that may make a difference. I'm running the model (primarily Qwen 3.6 35B but would like the ability to switch to 27B and other models on the fly) on my windows PC with llama.cpp. My local Linux server hosts all of my code and dev environments, and I primarily use my windows laptop with VSCode on an SSH workspace in to my server (which works fine with GHCP and any agentic tooling). I plan to also setup Hermes for non-coding use (on the linux server), also using the windows PC's models (the server only has a 1060 6GB GPU...looking at doing embeddings and such on it once I figure that out!).

So with that setup, what is the best integration with VSCode? The Hermes extension and use Hermes for coding as well? Continue pointed directly to my llama.cpp? Cline pointed to either Hermes (is that even possible?) or llama? Run pi.dev alongside Hermes and somehow integrate that (tho it seems pi is mostly for cli dev?). Some other option? Appreciate any advice!

[-]

ai_guy_nerd@reddit

Continue is generally the most stable path for direct llama.cpp integration since it handles the prompt templates and context windows well. If the goal is a more agentic experience, Cline is the current favorite for most developers. It can be pointed to a local OpenAI-compatible endpoint, which llama.cpp provides, allowing for much more autonomous refactoring than simple autocomplete.

Setting up a simple proxy or using a tool like LiteLLM on the Linux server could help manage the traffic between the Windows host and the VSCode SSH session. This prevents the network hop from becoming a bottleneck and makes it easier to switch between the 27B and 35B models on the fly.

OpenClaw is also worth looking at if you want to move beyond just VSCode and actually automate the deployment or research side of your dev cycle. Otherwise, sticking with the Continue + Cline combo provides the closest experience to Copilot while keeping everything local.

tredbert@reddit

I’ve only tried Continue and Cline. Best for me so far is Continue in VSCode (or Cursor) pointed to my llama-server. But I’m curious what others are using.