Ollama and OpenCode working very poorly with all models
Posted by Lkemb@reddit | LocalLLaMA | View on Reddit | 29 comments
Hi All,
I'm just getting into self-hosting models. My goal is to have my home PC run the models and then any of my computers on or off my network be able to talk to my raspberry pi via CF tunnels, which also queries my home PC.
Note: I have a Radeon 7900XT with 20GB VRAM.
However, before I even get there, I've setup ollama on my desktop PC, downloaded quite a few models and then connected to opencode. No matter what model I try, they are almost useless when it comes to running opencode commands.
For example, I ask it to summarize a certain file on my PC, and like 10% of the time it will actually read the file, but even then, it'll just stop outputting and I have to keep coercing it to continue. Here's an example with qwen2.5-coder:14b with 64K context.
```
"C:\Users\
Read this code and tell me what the project does and summarize the important files.
---
To analyze the Listle project, I'll need to examine some key files. Let's start by reading the main files that typically define a project structure.
[
{
"name": "read",
"arguments": {
"filePath": "C:\\Users\\
}
},
{
"name": "read",
"arguments": {
"filePath": "C:\\Users\\
}
},
{
"name": "glob",
"arguments": {
"pattern": "**/*.js",
"filePath": "C:\\Users\\
}
}
]
Build · qwen2.5-coder:14b-64K · 29.0s
```
I've seen some people here with incredible results, I just can't get it to work. Am I doing something wrong? Why are all these models terrible at executing instructions in opencode? I want something more than a "text chat" bot, I want it to be able to search my PC, search the web, etc.
chibop1@reddit
Regardless engine, you'll get poor result unless you can run at least qwen 3.5-27b, gemma 4-31b, or qwen-3.6-35b with at least 64K context size!
Ollama works fine,. It's just a bad choice for model and context size. You need to set the context size to 64k on Ollama and run one of the models mentioned above.
Lkemb@reddit (OP)
I can probably run the qwen3.5-27b, but doubt I would have enough vram to also have 64K context size. Regardless, I'll give this a shot. Thanks.
CalligrapherFar7833@reddit
Llm slop - run llama.cpp with a modern model if you are not a bot
Lkemb@reddit (OP)
I promise you I'm not a bot lol
I've seen lots of people recommend things other than ollama, so maybe that's the issue?
Opteron67@reddit
but dumb as a bot
Lkemb@reddit (OP)
Was that really necessary?
Opteron67@reddit
why on earth use lolama ?
mlhher@reddit
> qwen2.5-coder:14b
This is the second time I am seeing this today. What is going on?
Also for your question. Why are you using such an old model? Further, you are using Windows. That alone will degrade any model heavily I assume (they have to parse Windows paths and deal with Windows paths and commands). The next issue is which quant are you using? Though with that model you should first change the model before doing anything else.
If you just want a generic chat interface just run llama-server. If you insist on Ollama I am certain they have stolen some parts of it anyway and have their own chat interface.
daywalker313@reddit
Must be kind of a time machine where people travel to this timeline from 2024.
But really, the first issue is ollama (use lmstudio at least) and the second is trying to use a decade-old model when qwen 3.6 was just released.
Lkemb@reddit (OP)
I understand people are wondering why this model, I tried many models, qwen3.5, gemma4, llama3.1, etc.
Gemini and Claude both recommended qwen2.5-coder since it was "tried and tested" for opencode.
Ollama was also recommended online, and it seemed easy to setup. If there's a better alternative, or if this is why I'm getting such issues I have no problem trying another program. But is this really a big deal? I figured it's just the "model runner", won't they run the same across any of these programs.
colin_colout@reddit
bruh. is that what ur llm told you to do?
qwen2.5 coder is from an era BEFORE opencode and claude code existed.
qwen3.5/3.6 was RL trained on an opencode-like environment. they agent-max for claude code which is essentially the same.
Lkemb@reddit (OP)
Lol yes, an LLM actually told me this. This is why I'm asking the question, because I'm new to this stuff and it seemed weird that they were recommending such an old llm.
colin_colout@reddit
You need to make to prompt the llm to use their web search tools to look up current practices for current month and year (give it the date explicitly) and remind it that things change daily and its training data is stale.
the issue is fixable. opus should get this easily with the right prompt
Lkemb@reddit (OP)
That model was just the most recent one I saw someone talk about online and gemini and claude recommended. I don't know much about models, but I also tried qwen3.5, gemma4, etc. They all did the same thing.
I don't want a generic chat interface. I want this for coding, document writing, summarizations, etc.
qwen_next_gguf_when@reddit
This post is an AI generated garbage based on the model name.
Lkemb@reddit (OP)
I am not a bot, but okay.
I tried qwen3.5, llama3.1, gemma4, etc. and they all did the same thing.
The reason I tried qwen2.5-coder is because claude and gemini said they are proven compatible with opencode.
colin_colout@reddit
the reason people are short tempered. every day someone posts in this sub saying qwen2.5-coder in ollama won't call tools.
every time it's because an llm told them to run it.
...so don't take everyone's short temper too seriously. ... but please don't use an antique model in a modern coding agent. same goes for llama3.x. these models weren't trained to call tools.
there's something going on here if you can't call tools with qwen3.5. maybe it's context length. maybe the jinja flag isn't being set. maybe ollama is still trash.
either way it's hard to know from your post since you're using a model that is known to not work and ollama which is (afaik since i haven't used it in a year) janky at best.
try with a modern model. also get your fronteir llm of choice to walk you through a current day 2026 llama.cpp install if you're brave (or maybe try lmstudio)
jonahbenton@reddit
At one point Ollama hard coded small context of 4k, independent of what the model supported. Check that not still happening
Lkemb@reddit (OP)
I saw in opencode that the context window would grow over 4k, so I don't think this is the issue.
colin_colout@reddit
are you confident in that? because ollama can't grow max context after the model is loaded (unless something changed recently). it reserves the context when the model starts.
maybe try lmstudio? it's a desktop app that works similarly to ollama but it's a lot better in my experience (both use llama.cpp under the hood).
lmstudio has a good builtin chat interface, and it also can expose an endpoint to connect to with opencode.
ultimately you'll have the most control with llama.cpp directly, but you might want to start slow learning about all the settings and how it works before diving in.
crawl for a bit and you'll be walking in no time (and eventually running).
Lkemb@reddit (OP)
I really appreciate your comment. I will have to be sure, but I believe it did as I saw 2 jumps in my VRAM usage. First when the model loads, and then a second jump shortly after (I assume for the context).
But yeah, I'm going to try out llama.cpp I think.
Velocita84@reddit
Your mistake is using ollama and a model made before tool calling was even a thing
Lkemb@reddit (OP)
What's the best alternative? Is ollama really the issue, isn't it just running a model?
Velocita84@reddit
Yes, ollama is that bad. Use llama.cpp with qwen3.5 (or 3.6) 35B or 27B if it's not too slow on your gpu
Lkemb@reddit (OP)
Thanks, I didn't know this. I'll try llama.cpp. I appreciate your help!
Velocita84@reddit
Llama.cpp has official vulkan builds in the releases, for rocm there's lemonade's builds https://github.com/lemonade-sdk/llamacpp-rocm/releases
Lkemb@reddit (OP)
This is really helpful. Seriously, thank you. I've been scratching my head at the complexities of this stuff, and honestly the traditional llm's have been steering me astray.
Velocita84@reddit
It's alright, just keep in mind to take what claude, gemini and other LLMs tell you with a big grain of salt, without access to the internet they are extremely unreliable at giving AI related advice, because the field moves so fast their baked in knowledge from training data can't keep up. If set up with some way of searching the web they become much more useful because they can look for more recent information
Velocita84@reddit
Llama-server documentation, which is what you want to replace ollama with, is here https://github.com/ggml-org/llama.cpp/tree/master/tools/server