Gemma 4 is great at real-time Japanese - English translation for games

Posted by KageYume@reddit | LocalLLaMA | View on Reddit | 24 comments

When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case.

Model:

Softwares:

Workflow:

  1. Luna hooks the dialogue and speaker from the game.
  2. A Python script structures the data (speaker, gender, dialogue).
  3. Luna sends the structured text and a system prompt to LM Studio
  4. Luna shows the translation.

What Gemma 4 does great:

  1. Even with reasoning disabled, Gemma 4 follows instruction in system prompt very well (instruction about character names, gender, dialogue format and translation tone).
  2. With structured text, gemma 4 deals with pronunciation well (one of the biggest challenges because Japanese spoken dialogue often omit subject).
  3. (Subjective) The translated dialogue reads well. I prefer it to Qwen 3.5 27B or 35B A3B.

What I dislike:

Gemma 4 uses much more VRAM for context than Qwen 3.5. I can fit Qwen 3.5 35B A3B (Q4_K_M) at a 64K context into 24GB VRAM and get 140 t/s, but Gemma 4 (Q5_K_M) maxes out my 24GB at just 8K-9K (both model files are 20.6GB). I'd appreciate it if anyone could tell me why this is happening and what can be done about it.

--

Translation Sample

!The girl works a part-time job at a café. Her tutor (MC) is also the manager of that café. The day before, she told him that she had failed a subject and needed a make-up exam on the 25th, so she asked for a tutoring session on the 24th as an excuse to stay behind after the café closes to give him a handmade Christmas present. The scene begins after the café closes on the evening of the 24th.!<