Ollama and OpenCode working very poorly with all models

Posted by Lkemb@reddit | LocalLLaMA | View on Reddit | 29 comments

Hi All,

I'm just getting into self-hosting models. My goal is to have my home PC run the models and then any of my computers on or off my network be able to talk to my raspberry pi via CF tunnels, which also queries my home PC.

Note: I have a Radeon 7900XT with 20GB VRAM.

However, before I even get there, I've setup ollama on my desktop PC, downloaded quite a few models and then connected to opencode. No matter what model I try, they are almost useless when it comes to running opencode commands.

For example, I ask it to summarize a certain file on my PC, and like 10% of the time it will actually read the file, but even then, it'll just stop outputting and I have to keep coercing it to continue. Here's an example with qwen2.5-coder:14b with 64K context.

```

"C:\Users\\Documents\Projects\"

Read this code and tell me what the project does and summarize the important files.

---

To analyze the Listle project, I'll need to examine some key files. Let's start by reading the main files that typically define a project structure.

[

{

"name": "read",

"arguments": {

"filePath": "C:\\Users\\\\Documents\\Projects\\\\package.json"

}

{

"name": "read",

"arguments": {

"filePath": "C:\\Users\\\\Documents\\Projects\\\\README.md"

}

{

"name": "glob",

"arguments": {

"pattern": "**/*.js",

"filePath": "C:\\Users\\\\Documents\\Projects\\\\src"

}

]

Build · qwen2.5-coder:14b-64K · 29.0s
```

I've seen some people here with incredible results, I just can't get it to work. Am I doing something wrong? Why are all these models terrible at executing instructions in opencode? I want something more than a "text chat" bot, I want it to be able to search my PC, search the web, etc.

[-]

chibop1@reddit

Regardless engine, you'll get poor result unless you can run at least qwen 3.5-27b, gemma 4-31b, or qwen-3.6-35b with at least 64K context size!

Ollama works fine,. It's just a bad choice for model and context size. You need to set the context size to 64k on Ollama and run one of the models mentioned above.

[-]

Lkemb@reddit (OP)

I can probably run the qwen3.5-27b, but doubt I would have enough vram to also have 64K context size. Regardless, I'll give this a shot. Thanks.

[-]

CalligrapherFar7833@reddit

Llm slop - run llama.cpp with a modern model if you are not a bot

[-]

Lkemb@reddit (OP)

I promise you I'm not a bot lol

I've seen lots of people recommend things other than ollama, so maybe that's the issue?

[-]

Opteron67@reddit

but dumb as a bot

[-]

Lkemb@reddit (OP)

Was that really necessary?

[-]

Opteron67@reddit

why on earth use lolama ?

[-]

mlhher@reddit

> qwen2.5-coder:14b

This is the second time I am seeing this today. What is going on?

Also for your question. Why are you using such an old model? Further, you are using Windows. That alone will degrade any model heavily I assume (they have to parse Windows paths and deal with Windows paths and commands). The next issue is which quant are you using? Though with that model you should first change the model before doing anything else.

If you just want a generic chat interface just run llama-server. If you insist on Ollama I am certain they have stolen some parts of it anyway and have their own chat interface.

[-]

daywalker313@reddit

Must be kind of a time machine where people travel to this timeline from 2024.

But really, the first issue is ollama (use lmstudio at least) and the second is trying to use a decade-old model when qwen 3.6 was just released.

[-]

Lkemb@reddit (OP)

I understand people are wondering why this model, I tried many models, qwen3.5, gemma4, llama3.1, etc.

Gemini and Claude both recommended qwen2.5-coder since it was "tried and tested" for opencode.

Ollama was also recommended online, and it seemed easy to setup. If there's a better alternative, or if this is why I'm getting such issues I have no problem trying another program. But is this really a big deal? I figured it's just the "model runner", won't they run the same across any of these programs.

[-]

colin_colout@reddit

bruh. is that what ur llm told you to do?

qwen2.5 coder is from an era BEFORE opencode and claude code existed.

qwen3.5/3.6 was RL trained on an opencode-like environment. they agent-max for claude code which is essentially the same.

[-]

Lkemb@reddit (OP)

Lol yes, an LLM actually told me this. This is why I'm asking the question, because I'm new to this stuff and it seemed weird that they were recommending such an old llm.

[-]

colin_colout@reddit

You need to make to prompt the llm to use their web search tools to look up current practices for current month and year (give it the date explicitly) and remind it that things change daily and its training data is stale.

the issue is fixable. opus should get this easily with the right prompt

[-]

Lkemb@reddit (OP)

That model was just the most recent one I saw someone talk about online and gemini and claude recommended. I don't know much about models, but I also tried qwen3.5, gemma4, etc. They all did the same thing.

I don't want a generic chat interface. I want this for coding, document writing, summarizations, etc.

[-]

qwen_next_gguf_when@reddit

This post is an AI generated garbage based on the model name.

[-]

Lkemb@reddit (OP)

I am not a bot, but okay.

I tried qwen3.5, llama3.1, gemma4, etc. and they all did the same thing.

The reason I tried qwen2.5-coder is because claude and gemini said they are proven compatible with opencode.

[-]

colin_colout@reddit

the reason people are short tempered. every day someone posts in this sub saying qwen2.5-coder in ollama won't call tools.

every time it's because an llm told them to run it.

...so don't take everyone's short temper too seriously. ... but please don't use an antique model in a modern coding agent. same goes for llama3.x. these models weren't trained to call tools.

there's something going on here if you can't call tools with qwen3.5. maybe it's context length. maybe the jinja flag isn't being set. maybe ollama is still trash.

either way it's hard to know from your post since you're using a model that is known to not work and ollama which is (afaik since i haven't used it in a year) janky at best.

try with a modern model. also get your fronteir llm of choice to walk you through a current day 2026 llama.cpp install if you're brave (or maybe try lmstudio)

[-]

jonahbenton@reddit

At one point Ollama hard coded small context of 4k, independent of what the model supported. Check that not still happening

[-]

Lkemb@reddit (OP)

I saw in opencode that the context window would grow over 4k, so I don't think this is the issue.

[-]

colin_colout@reddit

are you confident in that? because ollama can't grow max context after the model is loaded (unless something changed recently). it reserves the context when the model starts.

maybe try lmstudio? it's a desktop app that works similarly to ollama but it's a lot better in my experience (both use llama.cpp under the hood).

lmstudio has a good builtin chat interface, and it also can expose an endpoint to connect to with opencode.

ultimately you'll have the most control with llama.cpp directly, but you might want to start slow learning about all the settings and how it works before diving in.

crawl for a bit and you'll be walking in no time (and eventually running).

[-]

Lkemb@reddit (OP)

I really appreciate your comment. I will have to be sure, but I believe it did as I saw 2 jumps in my VRAM usage. First when the model loads, and then a second jump shortly after (I assume for the context).

But yeah, I'm going to try out llama.cpp I think.

[-]

Velocita84@reddit

Your mistake is using ollama and a model made before tool calling was even a thing

[-]

Lkemb@reddit (OP)

What's the best alternative? Is ollama really the issue, isn't it just running a model?

[-]

Velocita84@reddit

Yes, ollama is that bad. Use llama.cpp with qwen3.5 (or 3.6) 35B or 27B if it's not too slow on your gpu

[-]

Lkemb@reddit (OP)

Thanks, I didn't know this. I'll try llama.cpp. I appreciate your help!

[-]

Velocita84@reddit

Llama.cpp has official vulkan builds in the releases, for rocm there's lemonade's builds https://github.com/lemonade-sdk/llamacpp-rocm/releases

[-]

Lkemb@reddit (OP)

This is really helpful. Seriously, thank you. I've been scratching my head at the complexities of this stuff, and honestly the traditional llm's have been steering me astray.

[-]

Velocita84@reddit

It's alright, just keep in mind to take what claude, gemini and other LLMs tell you with a big grain of salt, without access to the internet they are extremely unreliable at giving AI related advice, because the field moves so fast their baked in knowledge from training data can't keep up. If set up with some way of searching the web they become much more useful because they can look for more recent information

[-]

Velocita84@reddit

Llama-server documentation, which is what you want to replace ollama with, is here https://github.com/ggml-org/llama.cpp/tree/master/tools/server