Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models?

[-]

jacek2023@reddit

It's because of the hype. There are very interesting models published by Mistral and NVIDIA and people don't discuss them.

[-]

ironwroth@reddit

This sub is heavily astroturfed by Qwen team too.

[-]

JsThiago5@reddit

They do good models in all ranges; in a sub where people run models on consumer-grade hardware, it is not that weird to them to dominate

[-]

I don't believe it and wouldn't be surprised if the Qwen team has zero reddit presence at all. They have never even bothered to do an AMA like Z.ai and other labs. For free advertising of their product that would have been a no-brainer, especially with astroturfed plants.

Mistral kind of sucks lately and their "small" model is now over 100B. Nvidia's release cadence is spotty and their 30B-A3B nemotron model isn't good compared to Qwen3.5-35A-3B let alone Qwen3.6-27B. Maybe their 100B+ models are more competitive but who has the hardware for that?

[-]

pmttyji@reddit (OP)

Can't blame Qwen team. Because they release models almost in all size ranges(0.8B, 2B, 4B, 9B, 27B, 35B, 122B, 397B, 0.6B, 1.7B, 14B, 30B, 32B, 80B, 235B, 480B, ...) .... so fanbase is big.

[-]

j0j0n4th4n@reddit

Can you elaborate on he ones you found good?

[-]

Knopty@reddit

One thing that surprised me about Granite family is that they have by far the worst multilingual capabilities among modern models. Since 2024 llama3 was probably the last popular model family that had limited language support and any newer releases from Mistral, Qwen, Google only got better and better with each update, improving formerly poorly covered languages like Russian, Ukrainian, Polish and others. Meanwhile Granite just stuck with a few languages with no effort to expand support even after about 1.5 years since Granite 3 release. Nowadays even some TTS models have better language coverage than Granite LLMs.

[-]

Howie33@reddit

For my setup, Granite 4.1 30b was the best model for multi-turn agentic use…. until fixes for Gemma 4 and Qwen 3.6 chat templates came out.

[-]

Enough-Astronaut9278@reddit

ibm just doesnt do hype marketing so granite flies under the radar here. 30b dense is solid for function calling and extraction though.

[-]

DeepWisdomGuy@reddit

The actual radar:

[-]

gearcontrol@reddit

Qwen 3.5 4B and 9B that good?

[-]

giant3@reddit

Yep. Qwen 3.5 9B @ Q4_K_M is a good compromise between performance and resources like VRAM & GPU/CPU TOPS.

[-]

kwizzle@reddit

That chart is specifically for tool use only, not reasoning or general knowledge or anything like that.

[-]

whodoneit1@reddit

Maybe he meant by flying under the radar they are at the bottom

[-]

seamonn@reddit

They are not even taking off

[-]

Healthy-Nebula-3603@reddit

Digging into dirt :)

[-]

GCoderDCoder@reddit

My bad... I think this is much more clear. Agreed!

[-]

Healthy-Nebula-3603@reddit

Or is just bad is you compare to qwen 3.6 or Gemma 4

[-]

GCoderDCoder@reddit

These are some of the models I index on. I had to remove the 30b nemo, 26b gemma4, etc in order to fit granite on the graph.

I do not think you need the biggest brain model for every task. IBM uses that as a crutch for not investing heavily in making better models. Gpt oss 20b is 12gb if I remember correctly at q4 and that's faster and better performing in it's lowest thinking mode than the most recent granite model that just recently was released. Qwen, gemma 4, and Nemo 30b among others would be better choices if needed.

[-]

Expensive-Paint-9490@reddit

MiniMax-2.7 and Qwen3.6-27B in my experience are not remotely comparable to Qwen3.5-397B-A17B.

[-]

Hydroskeletal@reddit

I tested it - it didn't quite perform up to snuff.

MoE would help a lot.

[-]

Embarrassed-Rich3397@reddit

Gemma and Qwen just have better dense models right now especially when qwen 3.6 27b is competing with mini frontier models.

[-]

FinalCap2680@reddit

As many around, I have a test prompt I like to use from time to time on new models. The granite 4.1 was quite interesting in its response:

PROMPT:

create a single file classic ----- ***** ----- game using only html, vanilla JS and CSS. control the game by keyboard ----- ***** -----. use detailed 2D graphics.

GRANITE 4.1 UDQ8 ANSWER:

Creating a full-fledged classic ----- ***** ----- game with detailed 2D graphics using only HTML, vanilla JavaScript, and CSS in a single file is quite an extensive task ----- ***** -----. However, I can guide you through creating a simplified version ----- ***** ----- This example won't have detailed 2D graphics but will give you a foundational structure to build upon.

So, to sum up - it is too much work and I'm too lazy, but here are some points and you start working ;)

However, to be honest to the model - the 3 files it provided while very basic and useless, at least rendered without code errors (there were some logic errors) from the first time. For comparison: I gave up on Mistral small 119B after the first try; Nemotron 3 super 120B rendered something at second try, but also useless and could not produce anything working after that; the older Qwens (including 3.5 27B and 122B and coder Next) and Gemma 4 did produce some results, but nothing close to Qwen 3.6. All models were at Q8. All models had logic errors.

PS: I get it, people may be getting sick about all that talk about qwen 3.6: https://www.reddit.com/r/LocalLLaMA/comments/1toxlog/stop_qwenllama_every_other_4th_post_in_this_sub/ but for now that is the reality. When you put a 27B model against 120B model ( https://www.youtube.com/watch?v=H-GtrbcDqYQ ) or even something bigger ( https://www.youtube.com/watch?v=iAIlTC4m8Fw ) and it performs close, that is something....

[-]

Jayfree138@reddit

The benchmark comparison isnt looking too good. Link to Bench between qwen, gemma, and granite

[-]

Eastern_Bet678@reddit

Overall doesn't appear great but IBM might be driven by internal use cases that aren't reflected in these benchmarks.

[-]

mtomas7@reddit

A special selling point of Granite models is that IBM did not use any unlicensed data in the training, which is important for enterprise customers.

[-]

Berlodo@reddit

Yes, specially good for internal use cases ... (not sure where I read it but those models were supposed to be very good at handling/converting legacy code cobol, I think, and associated frameworkd and tools and updating to IBM java frameworks )

[-]

Longjumping-Sweet818@reddit

Or, what's more likely, IBM is an enterprise circlejerk club that doesn't have the necessary technical expertise or knowhow to go head-to-head with the likes of Google. They make their money by selling overpriced enterprise "solutions" to companies that don't know any better, so naturally they can't produce something of actual worth anymore.

[-]

Eastern_Bet678@reddit

No they don't have the depth that premier AI players have. All the more reason to spend your resources carefully and tune to the problems you have.

[-]

DeepWisdomGuy@reddit

If they were a small-cap public company, this model would have sunk them.

[-]

thedogcow@reddit

I don't know if it's their architecture, or my setup, but the vram usage for Granite goes up with context much more than qwen/gemma/everything, which often pushes it outside my practical window. That being said, Granite guardian is my go to in the governance layer, if that is a thing you care about.

[-]

leonbollerup@reddit

Is granite even an option ?

[-]

PhotographerUSA@reddit

Qwen3.6-35B-MTP is all you need

[-]

Bulky-Priority6824@reddit

I prefer non-MTP for 35b, sir.

[-]

PhotographerUSA@reddit

You will get 10 tokens a sec

[-]

FinalCap2680@reddit

Better 10 tok/sec of gold than 1000 tok/sec of useless junk (yes, not all of it will be junk, but you will spend more time checking). Non MTP and BF16.

[-]

I1lII1l@reddit

you know his hardware?

[-]

Bulky-Priority6824@reddit

I could have 50 5090s strewn across an old smokey yellow BINGO table

[-]

Bulky-Priority6824@reddit

I don't use sys ram or cpu to load my models, sir.

[-]

DeepBlue96@reddit

granite is really bad thats it.

[-]

silenceimpaired@reddit

I missed it was released. I value non reasoning models not focused on agentic use as they can work better for creative writing/editing better. I’m curious how this will perform.

[-]

Agreeable_System_785@reddit

Thank you for sharing this. I was not aware of these Granite models.

I see that Dutch is supported, so I think I will try it out. I have seen models that mix Dutch with Flemish, which makes them not useable for certain use cases.

Are there particular areas that the Granite models are strong in?

[-]

riceinmybelly@reddit

But Dutch and Flemish are the same?

[-]

Agreeable_System_785@reddit

Yes and no. The same word used in Belgium can have a different meaning in Dutch. Also Flemish sounds more formal because of the use of 'u'.

We can understand each other and yes, the most notable difference is just the accent in speaking I think.

[-]

riceinmybelly@reddit

Poepen

[-]

Kahvana@reddit

Ha!

I'm a Dutch native. Flemish is more like a dialect or it's own language depending on the region. I can understand flemish 90% of the time, but some things are quite different.

[-]

riceinmybelly@reddit

I’m Flemish and I always thought your news ankers spoke more Flemish than Dutch, but yeah the dialects here are unforgivingly different

[-]

GronklyTheSnerd@reddit

It can get weird and complicated. The distinctions between closely related languages and dialects are sometimes extremely unclear. Are people from Glasgow speaking a very difficult to understand dialect of English, or a very closely related language? Hard to say, even for linguists. (Honestly, Flemish may be easier for you than Glasgow English is for most British people!)

Some of my ancestors were Frisian, living in a little town just on the German side of the border. Apparently some of them could also understand, and make themselves understood in Dutch, High and Low German, and even Danish. No record of them dealing with Flemish, but I suspect they could have got by. It makes me a little sad that they seem to be disappearing in Europe, and my family assimilated into America and lost their language a century ago.

[-]

Kahvana@reddit

I only tested the model briefly in Dutch, had more success with Gemma4 31B.

It's one of the few models that is ISO (42001) certified. It's also quite lightweight to run; the 3B model can run on netbook specs (Intel Pentium Silver N5000, 8GB DDR4-2400MHz single-channel, Intel UHD Graphics 605. Runs with \~2.5 t/s TG on Vulkan iGPU with llama.cpp).

[-]

pmttyji@reddit (OP)

https://huggingface.co/blog/ibm-granite/granite-4-1

[-]

HokkaidoNights@reddit

Granite has an interesting looking ecosystem if you look deeper - its on my list to test out. For instance Granite guardian could have some interesting applications depending on use case.

[-]

atumblingdandelion@reddit

I’am using their 4.1 3B version for local RAG and its quite good at that

[-]

Environmental-Metal9@reddit

This model is a godsend after working in my own training pipeline to take Gemma 4 base and doing a full round of CPT and my own SFT (creative writing + reasoning) on it. This model is like what Llama 5 could have been (very similar basic architecture) and the granite 4.1 base is super efficient to train compared to Gemma 4 (roughly 2.6 more efficient due to the smaller vocab size which may or may not be what one needs; it is what I needed)

Thank you for raising awareness of it. I stopped looking at granite for my own training because the mamba hybrid architecture of the previous model was too difficult for me to figure out how to properly train. (Skill issues on my part!)

[-]

whodoneit1@reddit

Maybe he meant by flying under the radar they are at the bottom?

[-]

olli-mac-p@reddit

Is you run a business in the EU I believe the Granite models are EU AI act compatible (Gemma might be as well) but I didn't check. But for private people, qwen is the way to go if code writing is the use case.

[-]

k_means_clusterfuck@reddit

They are overshadowed because Qwen3.6 27b and Gemma4 31b are just better.

[-]

fijasko_ultimate@reddit

imho they are overshadowed bcs of no reasoning

kinda sucks if you re gpu poor and you can load only one model

other than that, i tried 4.0 for some simple summarizatio tasks / tool calls - worked great

i am gonna definately try the new 4.1 series

[-]

MomentJolly3535@reddit

i gave it very quick try on a simple prompt (html file where you put 2 different texts and it is supposed to show the difference) it performed extremely poorly, even qwen 9B (reasoning off) performed way better

Maybe my settings were not good.

[-]

rawdikrik@reddit

One day, we will learn that benchmarks arent real life.

I've used it and it is decent. Not as good as gemma or qwen 3.6, but the creative writing is decent.

It will fit some workflows.

[-]

Techie42@reddit

In my "just starting out with local" and the "let's try every model phase, but on the wrong setup", Granite didn't perform well, so I dropped it. These days I should give it another go. One of those "you don't get a second chance at a first impression" issues.

[-]

DoorStuckSickDuck@reddit

Their latest STT is very very noice, especially the plus variant. Great features and works well.

[-]

pmttyji@reddit (OP)

u/ibm Comeback with Big Bang! 30-50B Dense & MOE, 100B MOE.

[-]

dsartori@reddit

Enthusiasts are generally looking for something different than Granite models offer, I think.