Why do AIs I use in continue keep trying to use tools that don't exist?

Posted by Long_Video7840@reddit | LocalLLaMA | View on Reddit | 20 comments

Several times per interaction I get errors like this

read_file failed because the arguments were invalid, with the following message: Cannot read properties of undefined (reading 'trim')

Please try something else or request further instructions.

read_skill failed with the message: Skill "README" not found. Available skills: none

Please try something else or request further instructions.

What is causing this and how do I fix it? This happens with almost every AI I've tested, qwen3.5, qwen3.6, LLama3.1, gemma4 and others.

[-]

idumlupinar@reddit

I started with Ollama + VS Code + Continue on Win11. Then I got similar issues. Later, I decided to use llama.cpp + OpenCode. I'm building from the source and QWEN models are quite good for me. During coding tasks I use 262k context size with 24gb vram. Because I'm using a used-gpu I had to reduce the power limit. After switching to this setup, things were smooth.

[-]

IsJaie55@reddit

I've read this several times but never tried, so OpenCode is better than Continue? cuz im really tired of Continue error tools

[-]

hurdurdur7@reddit

I think you are using a low quant of the model. Only try Q6 and up, anything beneath is prone to errors like these.

[-]

FullstackSensei@reddit

Why don't you tell us more about the models, like which quants are you using, whether you are following recommended parameters, and how you're running those models?

[-]

Long_Video7840@reddit (OP)

I'm using qwen3.5 35b exactly as it comes from ollama. I'm using vscodium with continue version 1.2.22. I'm not sure I follow what you are asking for with recommended parameters.

[-]

TheTerrasque@reddit

Oh , ollama. That explains it.

[-]

Long_Video7840@reddit (OP)

What's wrong with ollama? What do people prefer to use?

[-]

Mountain_Station3682@reddit

https://sleepingrobots.com/dreams/stop-using-ollama/

[-]

my_name_isnt_clever@reddit

This is exellent, and honestly should be a pinned post in this sub so we get less Ollama questions. People abusing FOSS culture for their own gain pisses me off.

[-]

Long_Video7840@reddit (OP)

TIL, thank you.

[-]

TheTerrasque@reddit

it's known to have issues like what you describe, usually from bad defaults or set up for other use cases.

I personally use llama.cpp, but vllm is also a popular server.

What OS and hardware do you have? Especially graphics card. I can see if I can help you try via llama.cpp instead.

[-]

Long_Video7840@reddit (OP)

I'm running cachyos, I have 48GB of system memory with a 5800XT and a 3090 and 3060.

[-]

TheTerrasque@reddit

Ok, I just shared a llama.cpp config that should work well with the 3090 providing not much else hogs video ram on it, should give you a starting point to work from.

[-]

SM8085@reddit

When I stopped using ollama they didn't make it obvious what happened with context overflow situations. ie. if you're using the default 4k token context (default for <24GiB VRAM apparently) and then continue tries to use more than that. Back then they seemed to silently truncate context. If that's the case then it makes sense the bot would be confused if it's only getting part of the instructions.

I use llama.cpp's llama-server, but LMStudio also seems fine. You could also test increasing ollama's context to see if it helps: https://docs.ollama.com/context-length

I normally roll with full context whenever my system allows it, even though I don't use more than 8-9k for some tasks. Something like Opencode has a 10k token instruction prompt. Hermes-Agent is closer to 20k. So then you need space for those instructions + whatever documents you want the bot to examine.

[-]

FullstackSensei@reddit

There's your problem. Exactly as it comes from ollama is Q4 which tends to lobotomize smaller models.

[-]

Long_Video7840@reddit (OP)

I didn't know this! Where do people typically get their models from to use with continue then? Or do most people not use continue?

[-]

FullstackSensei@reddit

Huggingface.

First thing I'd do is ditch ollama and learn to use llama.cpp, better yet, build it from source. For software development, you'll do yourself a huge favor. Search r/LocalLLaMA for both, and read some of the documentation on llama.cpp.

Second, while I haven't used continue in over 6 months, my experience with it was generally less than ideal. There are a ton more tools now, but my favorite is still roo code when working in vs code. Maybe also give that a shot

[-]

false79@reddit

Something in your specific context is present when it's not supposed to be there that is screwing things up.

Whatever tool you are using that you've tested "almost every AI", use something else with zero customizations. Through process of eliination, can identify what the issue is because a lot of people are NOT experieicning tool issues these days. It's actually quite very good.

[-]

Long_Video7840@reddit (OP)

I apologize, I tried to make this post several times but was locked out due to karma, I think I accidentally missed some of the context when I copy and pasted my first draft. I'm using continue with vscodium. The models I listed in my post are the most recent ones I tried and they all had the same problem, I tried others before that but I don't recall what they all were. I could try running the model through ollama cli and try that, but it would be nice to get something working in my IDE for more convenience.

[-]

ShengrenR@reddit

Nothing to do with the model - something in your harness/agent/chat-app (whatever) is clearly telling these models that there is such a thing via system prompt or the like - something 'else' is broken.. not the model.