Gemma 4 is great at real-time Japanese - English translation for games

Posted by KageYume@reddit | LocalLLaMA | View on Reddit | 24 comments

When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case.

Model:

Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M
Context: 8196
Reasoning: OFF

Softwares:

Front end: Luna Translator
Back end: LM Studio

Workflow:

Luna hooks the dialogue and speaker from the game.
A Python script structures the data (speaker, gender, dialogue).
Luna sends the structured text and a system prompt to LM Studio
Luna shows the translation.

What Gemma 4 does great:

Even with reasoning disabled, Gemma 4 follows instruction in system prompt very well (instruction about character names, gender, dialogue format and translation tone).
With structured text, gemma 4 deals with pronunciation well (one of the biggest challenges because Japanese spoken dialogue often omit subject).
(Subjective) The translated dialogue reads well. I prefer it to Qwen 3.5 27B or 35B A3B.

What I dislike:

Gemma 4 uses much more VRAM for context than Qwen 3.5. I can fit Qwen 3.5 35B A3B (Q4_K_M) at a 64K context into 24GB VRAM and get 140 t/s, but Gemma 4 (Q5_K_M) maxes out my 24GB at just 8K-9K (both model files are 20.6GB). I'd appreciate it if anyone could tell me why this is happening and what can be done about it.

Translation Sample

!The girl works a part-time job at a café. Her tutor (MC) is also the manager of that café. The day before, she told him that she had failed a subject and needed a make-up exam on the 25th, so she asked for a tutoring session on the 24th as an excuse to stay behind after the café closes to give him a handmade Christmas present. The scene begins after the café closes on the evening of the 24th.!<

[-]

InuRyu@reddit

have you compared gemma 4 with translategemma? I'm curious to see which is better

[-]

KageYume@reddit (OP)

I'd tried using TranslateGemma 27B with the same VN. In short, Gemma 4 is better, both in translation quality and ease of use.

TranslateGemma is much worse at following instruction (rdo not put speaker tag to output, do not add translation note). Sometimes Gemma 3 follows it well at the start, then starts getting wrong after a while.
TranslateGemma get pronouns wrong more often than Gemma 4.
TranslateGemma is harder to use with LM Studio and Luna Translator. You have to use a custom jinja template for it while the default Gemma 4 works out of the box.

[-]

summersss@reddit

new to this. Can that luna program work with video players as well and translate video or hook to a audio player to just translate audiobook. Trying to get something working with gemma4e4b but am lost.

[-]

KageYume@reddit (OP)

Short answer: No, it can't.

Long answer: Luna translator support translating text from either: 1. A game's process (it hooks to the process to extract text from text containing thread). 2. An area you define on the screen using OCR.

So in your use case: Video player: it can technically work using the second method but not well because you the translated text won't be overlayed to the right area (Luna translator's window position is static).

Audio player: it doesn't work because Luna does not support audio input.

[-]

math_AI_Japan@reddit

Just curious. I get this newsletter from a Japanese AI group. When they announce a conference and one of the presenters will discuss the following. I'm not sure about the accuracy of their evaluation of current methods. I appreciate any comments.

The Current State of Manga Machine Translation: Shonosuke Ishiwata (CEO, Mantra Inc.)

overview:
Advances in Large-Scale Language Models (LLMs) have significantly improved the performance of machine translation. However, translation in the entertainment field requires more than just accurate translation; it demands localization that goes beyond mere translation, accurately conveying the author's message. Manga translation, in particular, involves a complex interplay of challenges beyond translation itself, including understanding character relationships, grasping the depiction within each page, text recognition including background text, and text placement. This presentation will provide an overview of Mantra's current work on manga machine translation, which began in 2020, and will also discuss the remaining technical challenges.

[-]

SoulsCross@reddit

How would you compare it to sugoi 14B? I've been using the Q8_0 and works great most of the time, but some translations can be a bit iffy.

[-]

KageYume@reddit (OP)

I've compared it Gemma 4 26B A4B (Q5_K_M) to Sugoi Ultra 14B (Q8_0). The conclusion is while Sugoi Ultra 14B is still very usable in 2026, Gemma 4 is smarter, more consistent and follows instruction better.

Comparison video (Top: Gemma | Bottom: Sugoi)

I noticed the followings:

Gemma tends to output longer sentences, sometimes a bit verbose compared to the original text while Sugoi tends to shorten the text.
Sugoi's prose is a bit "dry". It's neutral for all characters despite there is instruction to adjust the tone of translated text according to the tone in the original text (cute way of speaking should sound cute in English text).
Gemma 4 is better at keeping the wording consistent after 3,4 sentences. Example: 2:49 - 2:51 ("ah"), 2:55 - 2:59 ("not it"). It's better at pronunciations as well.
Gemma 4 is better at wordplay too (3:29 - 3:32). At 3:29, Hitoshi said "advancing to next grade is more important than making year-end gift (お歳暮)" and then Asuka replied "It isn't a year-end gift (お歳暮), it's a Christmas present (クリスマスプレゼント)!". Year-end gift is a general gift while Christmas present is the one a girl give to her crush so there is a difference here. Gemma gets both right while Sugoi translates both to Christmas gift/present.

[-]

psychohistorian8@reddit

I don't know how to resolve, but I am also having extreme context window issues with Gemma 4

I could load Qwen 3.5 27B with \~32k context window, and with Gemma 4 31B I have to go down to \~8k otherwise my Mac is hard crashing/rebooting

[-]

KageYume@reddit (OP)

A runtime update in LM Studio today (llama.cpp 2.11.0) fixed the VRAM hogging issue for me.

[-]

psychohistorian8@reddit

YES!

I was able to load Gemma 4 31B with 32k context, which matches what I was able to do with Qwen 3.5 27B

Finally I can start testing!!

[-]

KageYume@reddit (OP)

Yeah, the dense 31B is one thing but it's weird that even the MoE 26B A4B also eats so much VRAM for such short context.

[-]

Tamitami@reddit

Did you try the two smaller gemma models for this?

[-]

KageYume@reddit (OP)

No, I didn't.

Models smaller than 20B translates VNs pretty poorly in my experience.

[-]

MiningDemon@reddit

Does it beat the vntl-llama3-8b-v2 model? That model has been fine-tuned for this task.

[-]

KageYume@reddit (OP)

Yes, it does and by far.

vntl-llama3-8b-v2 is both ancient and small so it has been beaten by many models released in the past 2 years: Gemma 3 27B, Sugoi 14B, Qwen 3.5 27B, not to mention Gemma 4.

Another thing is for a model to works well for this purpose (VN translation), it has to be able to follow instruction well and I can't say that about vntl-llama3-8b-v2.

[-]

_Sub01_@reddit

I'm really hoping for the Shisa AI team to release their finetuned JP models for Gemma 4 IT, seeing how strong of a foundation model Gemma 4 is.

[-]

Educational_Grab_473@reddit

Do you think it beats gemini 3.1 flash lite? I've been using it because of the speed

[-]

KageYume@reddit (OP)

I haven't used 3.1 Flash Lite because if I have to use cloud models, I'd rather use a bigger ones such as DeepSeek for a significant increase in quality compared to local models.

[-]

ZBoblq@reddit

Have you ever tried qwen 3 coder next? I use it to translate japanese and russian transcripts from youtube, either generated trough whisper or just from the native subtitles and it's (somewhat surprisingly) by far the best LLM translator I have used. It also rarely drops any text for no reason, as all other models tend to do, regardless of their size.

[-]

petuman@reddit

I'd appreciate it if anyone could tell me why this is happening and what can be done about it.

remove custom batch size args (-ub -b) if you have any set, add -np 1 to disable batching if you don't need parallel query processing. That saves some memory.

Gemma is just not as efficient at KV cache / context size per token. But 26B is quite manageable, try smaller quant -- UD-Q4_K_XL with 192K context is 21GB.

[-]

KageYume@reddit (OP)

Thank you. I will try -np 1 and download a Q4_K_M just in case I need longer context.

[-]

Velocita84@reddit

I just tried it and it seems incredibly good at translating doujinshi dialogue for its size, i threw a pretty difficult transcribed bubble at it and it translated it pretty much flawlessly, unlike many other models that i could run. I'm also surprised that it doesn't complain about nsfw at all

[-]

Tenerezza@reddit

You can free up \~2-3GB vram by removing or moving the mmopro file away from the same folder as the model, this will disable the vision support, so i just recommend moving so you can put it back the times you need it but it's not needed for translating texts.

With that said i'm curious about your python script for structuring up the data, haven't thoght of doing that myself.

[-]

KageYume@reddit (OP)

I already renamed the vision files and moved it to a subfolder in the sample so that the model size is only 20.6GB but after loading it tooks 23.4GB VRAM at only 8K context length...

Regarding the structured text, it looks like this.

I've added link to both the script and system prompt to the top post.