Why is my ollama gemma4 replying in Japanese?

[-]

Separate-Frosting172@reddit

To begin with, it’s an odd response to “Hello.”
企画書作成支援 is Support for Creating Proposals

[-]

That might be an issue with chat template.
Here is a piece of advice for you. Ditch Ollama. It's popularity, at least on this sub, has tanked drastically in the last 6 months. There are other platform that are better, where you can download quantized models from different providers and they would work just fine,
You should know that Bartwoski and Unsloth has updated their quantized models yesterday that includes fixes to system prompts and the inclusion of SWA. So,

[-]

Houston_NeverMind@reddit (OP)

Thanks! I have switched to llama.cpp and it's a bliss. I don't even have to care about stupid rocm issues either. Vulkan is working perfectly for me.

[-]

Iory1998@reddit

Try LM Studio. It has a polished interface, works out of the box, and has active development.

[-]

DeepOrangeSky@reddit

LM Studio is my favorite for literally every other model I've ever used except for Gemma4, since Gemma4 (especially 31b more so than 26b) still has that weird runaway memory bloat issue in LM Studio. Not sure if it will be fixable for LM Studio or not, but I tried re-downloading the quants (I've tried Bartowski q8_0, Unsloth q6_k, and Unsloth q5_XL), making sure runtime/version is up to date, and nothing works. Bringing max concurrent predictions down from 4 down to 1 helps a little (makes the memory cumulatively increase by "only" like ~2-2.5GB per reply at q8 rather than 5-7GB per reply, but, that's still pretty bad). Ejecting and reloading the model also brings it back down, but you have to keep doing it every few replies, and then the prompt processing on that 1st reply after reloading takes an insane amount of time, and also is just kind of a ridiculous "solution" to have to use.

[-]

PromptInjection_@reddit

As other already mentioned:
Ollamas 'golden age' seems to be over. Try to move to llama.cpp or LMStudio.

[-]

droptableadventures@reddit

Hot take: its golden age was when it just tried to be a straight wrapper for llama.cpp. The more of llama.cpp they've tried to replace with their own vibe coded implementation, the worse it's become.

[-]

sersoniko@reddit

I’m so glad I migrated from Ollama to LM Studio

[-]

RottenPingu1@reddit

I'm frustrated a lot with Ollama and Open Webui. Seems every fre months I have to reinstall. LM Studio looks stable with fantastic UI. Is privacy and issue? Why do you like it?

[-]

Nyghtbynger@reddit

Tried OpenWebUI once, I was shocked at the amount of bloat I had to install.
Now I use the Copilot extension in Obsidian if I have to talk to a model online and keep an history. Not as cool as a browser interface but it does the job

[-]

FalconX88@reddit

Tried OpenWebUI once, I was shocked at the amount of bloat I had to install.

It's a single docker container or a simple pip install. You need docker or python/pip, that's it.

It also has a completely different usecase than what you use it for.

[-]

Nyghtbynger@reddit

I mean, the surprise is more into the number of packages I had to install. if I went the docker route that would have been straightforward, but I don't like having containers on my PC that run in the background for intermittent programs

[-]

chimph@reddit

stop the container?

[-]

elongated_argonian@reddit

LM Studio is closed-source, and apparently their privacy policy is concerning to some people? I've never read it myself, so can't confirm. I personally use Jan with llama.cpp, both of which are FOSS.

[-]

muxxington@reddit

So you switched from one wrapper to another. Why didn't you just take the next step right away?

[-]

sersoniko@reddit

Ollama isn’t a wrapper around the official llama.cpp, they have their own fork and when a fix or feature is released it can take ages for Ollama to adopt them

[-]

RobTheDude_OG@reddit

I usually get my models from LM studio, but i do run llamacpp HIP more as of late as i do see a performance difference.

Like faster processing and stream speeds with some 24B - 31B models

Smaller models like 8B and 12B it's negligible tho

[-]

elongated_argonian@reddit

Can confirm, GPT-OSS 20B seems about 10-20% faster on llama.cpp than Ollama on my hardware.

[-]

relmny@reddit

"why is my ollama..."

yeap, that's your problem right there...

[-]

samas69420@reddit

[-]

TurnUpThe4D3D3D3@reddit

The god seed

[-]

s403bot@reddit

Could be this issue: https://github.com/ollama/ollama/issues/15261

[-]

Houston_NeverMind@reddit (OP)

Yup, this is it.

[-]

crazycomputer84@reddit

maby the model watched too much anime while training lol.

[-]

Houston_NeverMind@reddit (OP)

Ok, something is wrong here

[-]

Emotional-Baker-490@reddit

Yeah, you accidentally used ollama

[-]

Houston_NeverMind@reddit (OP)

You are right. I switched to llama.cpp and everything is cool.

[-]

wektor420@reddit

Use llama.cpp

[-]

LtCommanderDatum@reddit

日本語はチャンピオンの言語だからです。

[-]

shipblazer420@reddit

wakarimasen lol

[-]

cryptofriday@reddit

Its new update. Try to learn Japanese.....

[-]

Iory1998@reddit

😂😂😂

[-]

Ayuzh@reddit

lol. I experienced this with ChatGPT as well today. When some words were from a different language but contextually correct

[-]

Nyghtbynger@reddit

It happened to me in Kimi and deepseek too. Is this the solar flares ?

[-]

ruuurbag@reddit

Claude randomly put a phrase in Japanese while replying to me the other day. It had no explanation for it.

[-]

Mickenfox@reddit

Language models just do that. Especially Chinese ones.

[-]

VoiceApprehensive893@reddit

deepseek and kimi have stupid amounts of chinese data fed into them so it causes them to slip into chinese easily

[-]

Houston_NeverMind@reddit (OP)

Were you able to solve it for chatgpt? Is it because of a corrupt model file?

[-]

Ayuzh@reddit

oh, so sorry to not give you the whole context, it was happening on chatgpt. com not the local model

[-]

Historical-Camera972@reddit

Well, I know what triggers Deepseek to speak in Chinese, fairly consistently.

Not sure if this is related to your issue, but if you make specific typos in your text, that are typical of Chinese ESL typers, then the LLM locks in on the preferred outputs for that tokenized input.

AKA - A probability layer, that acts like an assumption layer.

>51% likely to be Chinese, ESL after explicit/common spelling/grammar error, typical of that group.

>That group prefers followup in Chinese output

BAM!

Then again, some people say that has nothing to do with it, but if it didn't, why can I force it to happen, by explicitly inputting those common typo/errors?

[-]

-dysangel-@reddit

Are you sure you didn't type "Hello" in a Japanese accent?

[-]

Houston_NeverMind@reddit (OP)

No, it was British English.

[-]

Houston_NeverMind@reddit (OP)

Apprently it replies in other languages too.