Why is my ollama gemma4 replying in Japanese?
Posted by Houston_NeverMind@reddit | LocalLLaMA | View on Reddit | 51 comments
Do I have to set some parameters or configuration? Sorry, I'm new to this.
Separate-Frosting172@reddit
To begin with, it’s an odd response to “Hello.”
企画書作成支援 is Support for Creating Proposals
Iory1998@reddit
That might be an issue with chat template.
Here is a piece of advice for you. Ditch Ollama. It's popularity, at least on this sub, has tanked drastically in the last 6 months. There are other platform that are better, where you can download quantized models from different providers and they would work just fine,
You should know that Bartwoski and Unsloth has updated their quantized models yesterday that includes fixes to system prompts and the inclusion of SWA. So,
Houston_NeverMind@reddit (OP)
Thanks! I have switched to llama.cpp and it's a bliss. I don't even have to care about stupid rocm issues either. Vulkan is working perfectly for me.
Iory1998@reddit
Try LM Studio. It has a polished interface, works out of the box, and has active development.
DeepOrangeSky@reddit
LM Studio is my favorite for literally every other model I've ever used except for Gemma4, since Gemma4 (especially 31b more so than 26b) still has that weird runaway memory bloat issue in LM Studio. Not sure if it will be fixable for LM Studio or not, but I tried re-downloading the quants (I've tried Bartowski q8_0, Unsloth q6_k, and Unsloth q5_XL), making sure runtime/version is up to date, and nothing works. Bringing max concurrent predictions down from 4 down to 1 helps a little (makes the memory cumulatively increase by "only" like ~2-2.5GB per reply at q8 rather than 5-7GB per reply, but, that's still pretty bad). Ejecting and reloading the model also brings it back down, but you have to keep doing it every few replies, and then the prompt processing on that 1st reply after reloading takes an insane amount of time, and also is just kind of a ridiculous "solution" to have to use.
PromptInjection_@reddit
As other already mentioned:
Ollamas 'golden age' seems to be over. Try to move to llama.cpp or LMStudio.
droptableadventures@reddit
Hot take: its golden age was when it just tried to be a straight wrapper for llama.cpp. The more of llama.cpp they've tried to replace with their own vibe coded implementation, the worse it's become.
sersoniko@reddit
I’m so glad I migrated from Ollama to LM Studio
RottenPingu1@reddit
I'm frustrated a lot with Ollama and Open Webui. Seems every fre months I have to reinstall. LM Studio looks stable with fantastic UI. Is privacy and issue? Why do you like it?
Nyghtbynger@reddit
Tried OpenWebUI once, I was shocked at the amount of bloat I had to install.
Now I use the Copilot extension in Obsidian if I have to talk to a model online and keep an history. Not as cool as a browser interface but it does the job
FalconX88@reddit
It's a single docker container or a simple pip install. You need docker or python/pip, that's it.
It also has a completely different usecase than what you use it for.
Nyghtbynger@reddit
I mean, the surprise is more into the number of packages I had to install. if I went the docker route that would have been straightforward, but I don't like having containers on my PC that run in the background for intermittent programs
chimph@reddit
stop the container?
elongated_argonian@reddit
LM Studio is closed-source, and apparently their privacy policy is concerning to some people? I've never read it myself, so can't confirm. I personally use Jan with llama.cpp, both of which are FOSS.
muxxington@reddit
So you switched from one wrapper to another. Why didn't you just take the next step right away?
sersoniko@reddit
Ollama isn’t a wrapper around the official llama.cpp, they have their own fork and when a fix or feature is released it can take ages for Ollama to adopt them
RobTheDude_OG@reddit
I usually get my models from LM studio, but i do run llamacpp HIP more as of late as i do see a performance difference.
Like faster processing and stream speeds with some 24B - 31B models
Smaller models like 8B and 12B it's negligible tho
elongated_argonian@reddit
Can confirm, GPT-OSS 20B seems about 10-20% faster on llama.cpp than Ollama on my hardware.
relmny@reddit
"why is my ollama..."
yeap, that's your problem right there...
samas69420@reddit
TurnUpThe4D3D3D3@reddit
The god seed
s403bot@reddit
Could be this issue: https://github.com/ollama/ollama/issues/15261
Houston_NeverMind@reddit (OP)
Yup, this is it.
crazycomputer84@reddit
maby the model watched too much anime while training lol.
Houston_NeverMind@reddit (OP)
Ok, something is wrong here
Emotional-Baker-490@reddit
Yeah, you accidentally used ollama
Houston_NeverMind@reddit (OP)
You are right. I switched to llama.cpp and everything is cool.
wektor420@reddit
Use llama.cpp
LtCommanderDatum@reddit
日本語はチャンピオンの言語だからです。
shipblazer420@reddit
wakarimasen lol
cryptofriday@reddit
Its new update. Try to learn Japanese.....
Iory1998@reddit
😂😂😂
Ayuzh@reddit
lol. I experienced this with ChatGPT as well today. When some words were from a different language but contextually correct
Nyghtbynger@reddit
It happened to me in Kimi and deepseek too. Is this the solar flares ?
ruuurbag@reddit
Claude randomly put a phrase in Japanese while replying to me the other day. It had no explanation for it.
Mickenfox@reddit
Language models just do that. Especially Chinese ones.
VoiceApprehensive893@reddit
deepseek and kimi have stupid amounts of chinese data fed into them so it causes them to slip into chinese easily
Houston_NeverMind@reddit (OP)
Were you able to solve it for chatgpt? Is it because of a corrupt model file?
Ayuzh@reddit
oh, so sorry to not give you the whole context, it was happening on chatgpt. com not the local model
Historical-Camera972@reddit
Well, I know what triggers Deepseek to speak in Chinese, fairly consistently.
Not sure if this is related to your issue, but if you make specific typos in your text, that are typical of Chinese ESL typers, then the LLM locks in on the preferred outputs for that tokenized input.
AKA - A probability layer, that acts like an assumption layer.
>51% likely to be Chinese, ESL after explicit/common spelling/grammar error, typical of that group.
>That group prefers followup in Chinese output
BAM!
Then again, some people say that has nothing to do with it, but if it didn't, why can I force it to happen, by explicitly inputting those common typo/errors?
YamataZen@reddit
I just use llama.cpp
muxxington@reddit
lolama
apparently_DMA@reddit
not enough vram for KV cache? hard to say, whats the model exactly and what rig u run it on?
try llama_server, gives you more control
-dysangel-@reddit
Are you sure you didn't type "Hello" in a Japanese accent?
Houston_NeverMind@reddit (OP)
No, it was British English.
darvs7@reddit
Did you happen to say Ollamagogemmasu?
junklont@reddit
Naniiiiii ?
Impossible-Hunt9117@reddit
Hello Kitty, obviously
Velocita84@reddit
Another ollama classic
pyy85@reddit
I have the same issue
Houston_NeverMind@reddit (OP)
Apprently it replies in other languages too.