Gemma 4 is great at real-time Japanese - English translation for games
Posted by KageYume@reddit | LocalLLaMA | View on Reddit | 24 comments
When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case.
Model:
- Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M
- Context: 8196
- Reasoning: OFF
Softwares:
- Front end: Luna Translator
- Back end: LM Studio
Workflow:
- Luna hooks the dialogue and speaker from the game.
- A Python script structures the data (speaker, gender, dialogue).
- Luna sends the structured text and a system prompt to LM Studio
- Luna shows the translation.
What Gemma 4 does great:
- Even with reasoning disabled, Gemma 4 follows instruction in system prompt very well (instruction about character names, gender, dialogue format and translation tone).
- With structured text, gemma 4 deals with pronunciation well (one of the biggest challenges because Japanese spoken dialogue often omit subject).
- (Subjective) The translated dialogue reads well. I prefer it to Qwen 3.5 27B or 35B A3B.
What I dislike:
Gemma 4 uses much more VRAM for context than Qwen 3.5. I can fit Qwen 3.5 35B A3B (Q4_K_M) at a 64K context into 24GB VRAM and get 140 t/s, but Gemma 4 (Q5_K_M) maxes out my 24GB at just 8K-9K (both model files are 20.6GB). I'd appreciate it if anyone could tell me why this is happening and what can be done about it.
--
!The girl works a part-time job at a café. Her tutor (MC) is also the manager of that café. The day before, she told him that she had failed a subject and needed a make-up exam on the 25th, so she asked for a tutoring session on the 24th as an excuse to stay behind after the café closes to give him a handmade Christmas present. The scene begins after the café closes on the evening of the 24th.!<
InuRyu@reddit
have you compared gemma 4 with translategemma? I'm curious to see which is better
KageYume@reddit (OP)
I'd tried using TranslateGemma 27B with the same VN. In short, Gemma 4 is better, both in translation quality and ease of use.
summersss@reddit
new to this. Can that luna program work with video players as well and translate video or hook to a audio player to just translate audiobook. Trying to get something working with gemma4e4b but am lost.
KageYume@reddit (OP)
Short answer: No, it can't.
Long answer: Luna translator support translating text from either: 1. A game's process (it hooks to the process to extract text from text containing thread). 2. An area you define on the screen using OCR.
So in your use case: Video player: it can technically work using the second method but not well because you the translated text won't be overlayed to the right area (Luna translator's window position is static).
Audio player: it doesn't work because Luna does not support audio input.
math_AI_Japan@reddit
Just curious. I get this newsletter from a Japanese AI group. When they announce a conference and one of the presenters will discuss the following. I'm not sure about the accuracy of their evaluation of current methods. I appreciate any comments.
The Current State of Manga Machine Translation: Shonosuke Ishiwata (CEO, Mantra Inc.)
overview:
Advances in Large-Scale Language Models (LLMs) have significantly improved the performance of machine translation. However, translation in the entertainment field requires more than just accurate translation; it demands localization that goes beyond mere translation, accurately conveying the author's message. Manga translation, in particular, involves a complex interplay of challenges beyond translation itself, including understanding character relationships, grasping the depiction within each page, text recognition including background text, and text placement. This presentation will provide an overview of Mantra's current work on manga machine translation, which began in 2020, and will also discuss the remaining technical challenges.
SoulsCross@reddit
How would you compare it to sugoi 14B? I've been using the Q8_0 and works great most of the time, but some translations can be a bit iffy.
KageYume@reddit (OP)
I've compared it Gemma 4 26B A4B (Q5_K_M) to Sugoi Ultra 14B (Q8_0). The conclusion is while Sugoi Ultra 14B is still very usable in 2026, Gemma 4 is smarter, more consistent and follows instruction better.
Comparison video (Top: Gemma | Bottom: Sugoi)
I noticed the followings:
psychohistorian8@reddit
I don't know how to resolve, but I am also having extreme context window issues with Gemma 4
I could load Qwen 3.5 27B with \~32k context window, and with Gemma 4 31B I have to go down to \~8k otherwise my Mac is hard crashing/rebooting
KageYume@reddit (OP)
A runtime update in LM Studio today (llama.cpp 2.11.0) fixed the VRAM hogging issue for me.
psychohistorian8@reddit
YES!
I was able to load Gemma 4 31B with 32k context, which matches what I was able to do with Qwen 3.5 27B
Finally I can start testing!!
KageYume@reddit (OP)
Yeah, the dense 31B is one thing but it's weird that even the MoE 26B A4B also eats so much VRAM for such short context.
Tamitami@reddit
Did you try the two smaller gemma models for this?
KageYume@reddit (OP)
No, I didn't.
Models smaller than 20B translates VNs pretty poorly in my experience.
MiningDemon@reddit
Does it beat the vntl-llama3-8b-v2 model? That model has been fine-tuned for this task.
KageYume@reddit (OP)
Yes, it does and by far.
vntl-llama3-8b-v2 is both ancient and small so it has been beaten by many models released in the past 2 years: Gemma 3 27B, Sugoi 14B, Qwen 3.5 27B, not to mention Gemma 4.
Another thing is for a model to works well for this purpose (VN translation), it has to be able to follow instruction well and I can't say that about vntl-llama3-8b-v2.
_Sub01_@reddit
I'm really hoping for the Shisa AI team to release their finetuned JP models for Gemma 4 IT, seeing how strong of a foundation model Gemma 4 is.
Educational_Grab_473@reddit
Do you think it beats gemini 3.1 flash lite? I've been using it because of the speed
KageYume@reddit (OP)
I haven't used 3.1 Flash Lite because if I have to use cloud models, I'd rather use a bigger ones such as DeepSeek for a significant increase in quality compared to local models.
ZBoblq@reddit
Have you ever tried qwen 3 coder next? I use it to translate japanese and russian transcripts from youtube, either generated trough whisper or just from the native subtitles and it's (somewhat surprisingly) by far the best LLM translator I have used. It also rarely drops any text for no reason, as all other models tend to do, regardless of their size.
petuman@reddit
remove custom batch size args (-ub -b) if you have any set, add -np 1 to disable batching if you don't need parallel query processing. That saves some memory.
Gemma is just not as efficient at KV cache / context size per token. But 26B is quite manageable, try smaller quant -- UD-Q4_K_XL with 192K context is 21GB.
KageYume@reddit (OP)
Thank you. I will try -np 1 and download a Q4_K_M just in case I need longer context.
Velocita84@reddit
I just tried it and it seems incredibly good at translating doujinshi dialogue for its size, i threw a pretty difficult transcribed bubble at it and it translated it pretty much flawlessly, unlike many other models that i could run. I'm also surprised that it doesn't complain about nsfw at all
Tenerezza@reddit
You can free up \~2-3GB vram by removing or moving the mmopro file away from the same folder as the model, this will disable the vision support, so i just recommend moving so you can put it back the times you need it but it's not needed for translating texts.
With that said i'm curious about your python script for structuring up the data, haven't thoght of doing that myself.
KageYume@reddit (OP)
I already renamed the vision files and moved it to a subfolder in the sample so that the model size is only 20.6GB but after loading it tooks 23.4GB VRAM at only 8K context length...
Regarding the structured text, it looks like this.
I've added link to both the script and system prompt to the top post.