Gemma 4 is a huge improvement in many European languages, including Danish, Dutch, French and Italian

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 56 comments

The benchmarks look really impressive for such small models. Even in general, they stand up well. Gemma 4 31B is (of all tested models):

- 3rd on Dutch

- 2nd on Danish

- 3rd on English

- 1st on Finish

- 2nd on French

- 5th on German

- 2nd on Italian

- 3rd on Swedish

Curious if real-world experience matches that.

Source: https://euroeval.com/leaderboards/

[-]

AffectionateHome3113@reddit

Only Gemma 4‑31b solved my own smol benchmark (a German exercise from a book). The other local models I tried all failed:

Qwen 3.5‑122b  q4  k  xl Qwen 3.5‑35b  q8 Qwen 3.5‑27b  q8 Gemma 4‑26b  q8

[-]

Inevitable-Name-1701@reddit

Butchering Hungarian language realy dissapointed.

[-]

drillmast3r@reddit

I tried to see if there had been any improvement in the Hungarian language compared to the previous model, but unfortunately, I don’t think so. And yet I was really looking forward to this model.

[-]

drillmast3r@reddit

Igen. A Gemma 3 már majdnem jó volt nekem, nagyon bíztam a 4-ben, dehát...

[-]

EntertainmentOne7897@reddit

Milyen feladatra nem elégséges? Próbáltam a racka meg puli modelleket, de nem volt nagy sikerem velük. Valszeg gemma jobb náluk

[-]

Healthy-Nebula-3603@reddit

Have you tested Aya 31b model ( trained for transactions)

[-]

alamacra@reddit

Are there any models that are any good in Hungarian, even of the really large ones?

[-]

I don't know the big ones. I’ve tried these: qwen3.5-35b - 27b, gpt-oss-20b, gemma-3-27b-it, glm-4.7-flash, and gemma-4-26b - 31b. In my opinion, gemma-3 produces the best Hungarian text out of these. But unfortunately, it’s not flawless either.

[-]

PunnyPandora@reddit

gemini is pretty good at them but it's not open source

[-]

pip25hu@reddit

Isn't it in second place for Hungarian, right after the latest Gemini?

[-]

Barbaricliberal@reddit

I've found Gemma 4 to be surprisingly good for Farsi/Persian translations and support.

From E4B upwards it's good (E2B leaves a lot to be desired).

[-]

ZeitgeistArchive@reddit

Is it functional now in LM Studio?

[-]

Ok_Fish_39@reddit

In one small European language, gemma-3-27b is much better than gemma-4-31B. Starting with the fact that gemma 3 starts the answer right away in the same language, while gemma 4 reasoning in English and then translates it poorly.

[-]

That_Country_7682@reddit

1st on finnish is actually wild. small models doing multilingual this well was not on my 2026 bingo card.

[-]

FinBenton@reddit

I did some stuff on it in Finnish, its best Iw seen but it does make a lot of mistakes.

[-]

drallcom3@reddit

but it does make a lot of mistakes

What kind of mistakes does it make? I'm curious. Does it get words wrong? Or is it more in long texts?

[-]

FinBenton@reddit

It does pretty clear mistakes like someone immigrated to finland would do rest of their life, there like 100 different variations of each word as its pretty weird language.

[-]

koloved@reddit

Is there a website that includes all languages, rather than just the ones that made the list for political reasons?

[-]

EmsMTN@reddit

Right! The second most commonly used language on the internet is conveniently absent.

[-]

tahini001@reddit

All 4000 languages?

[-]

draconisx4@reddit

Solid benchmarks on Gemma 4, but real-world testing is crucial to ensure these models don't introduce biases in different languages that's where proper oversight pays off. What's your experience with cultural safeguards?

[-]

anotheruser323@reddit

Non-professional translation is one of the things I think LLMs are actually good for. And google seems to be the best at it currently.

[-]

PreciselyWrong@reddit

Google Translate, on the other hand, has become utter garbage compared to deepl and such

[-]

Mrfrednot@reddit

What model should I use for old greek? Is there one that is specifically good for old texts?

[-]

ambient_temp_xeno@reddit

They really just gave us a SOTA translation model.

[-]

LoafyLemon@reddit

31B can actually roleplay too, without extra fine tuning or decensoring! I am actually amazed and baffled at the same time. It really sticks to character descriptions, so if you do DnD RP, villains are actually villainous.

It also has a lot of knowledge about fantasy worlds, which is fun.

[-]

ambient_temp_xeno@reddit

Gemma 4 really showed the weaknesses of the Chinese models in terms of their censorship (not just the Tiananmen square kind, the ass-slapping kind) and lack of datasets (compared to Google).

Mistral of course is even worse.

[-]

pol_phil@reddit

The translation evals can be misleading. After testing on some lower resource EU languages for scientific document translation, Gemma4 can lose coherence and start outputting random Chinese/Hindi/Arabic.

[-]

ambient_temp_xeno@reddit

Is gemma 4 working right in what you're testing it on though? Gemma 4 was broken for me in llama.cpp in an insidious way until b8648

[-]

Mark__27@reddit

Is there a similar eval to this for Arabic/Hindi?

[-]

arbv@reddit

Unfortunately, it is worse at Ukrainian. Gemma 3 27B was near perfect, second only to Google's cloud models.

[-]

madsheepPL@reddit

I don't see qwen 3.5 27B in there... It's been a top performer for me.

[-]

Icy-Degree6161@reddit

For Europ3an languages specifically?

[-]

madsheepPL@reddit

yeah, translations from different languages into english. I'll have to review it though, going through this made me realize I need better benchmarks.

[-]

windozeFanboi@reddit

Qwen 3.5 was the first qwen to actually be usable for many EU languages, but it's closer to Gemma 3 than 4...

I mean, as far as the non popular languages i speak

[-]

FlamaVadim@reddit

unfortunately qwen is not so great in translations

[-]

Tenerezza@reddit

Not for me that's for sure, sure can only verify Swedish, Norwegian and Danish to a extent and every Qwen model is worse then even Gemma 3 when it comes to translations. And the few times i needed to use Finnish it's basically just crap. Gemma is by far much better at it and Gemma 4 is actually one of the best yet even better then claude so far.

[-]

FinBenton@reddit

Atleast in Finnish language, qwen is just horrible compared to gemma.

[-]

Moreh@reddit

Many requests below are asking for similar benchmarks for non-european languages, does anyone know if such a thing exists? I know google is the best for most languages, but i am interested whether it beats qwen for asian languages like Indonesian.

[-]

Cold_Tree190@reddit

Has anyone tested it with Japanese? How well does it perform if so?

[-]

unskilledexplorer@reddit

what does the rank mean? average position of a model across various tasks? so if a model is rank 1.34, it is only good relative to other models, right? so if all models are bad at a particular language, then...

[-]

Mark__27@reddit

What about Arabic/Hindi?

[-]

HigherConfusion@reddit

Thanks. It confirm my own experience, that Gemma 3 12B, is still the best model at Danish, my machine can handle. It feels like Gemma 4 left a big gab between E4B and 26B-A4B.

[-]

alexx_kidd@reddit

Greek.. worse

[-]

Fluxx1001@reddit

Interesting Leaderboard. However it's strange that Mistral models are way behind in this benchmark, although they are explicitly trained on being multilingual European.

[-]

bonobomaster@reddit

As a German this is still a good reminder to only talk to any LLM in that language it was trained on the most.

[-]

Icy-Degree6161@reddit

Is this about generic language interaction or tranlsation specifically?

In the translation space for these languages I found TranslateGemma and EuroLLM to be great.

[-]

Mashic@reddit

In my tests for English > Arabic translation. Gemma 4 beats translategemma out of the water.

[-]

Mashic@reddit

For English to Arabic too. I have really been impressive of its accuracy over translategemma.

[-]

phido3000@reddit

One day they will develop and AI that can understand Australian.

[-]

Available_Load_5334@reddit

https://millionaire-bench.referi.de/

A benchmark using questions from the German version of "Who Wants to Be a Millionaire?".

[-]