Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 63 comments
I don't see any threads on this model. Is it because it's dense and/or without-reasoning? Anyone tried this for coding?
Capabilities
Summarization
Text classification
Text extraction
Question-answering
Retrieval Augmented Generation (RAG)
Code related tasks
Function-calling tasks
Multilingual dialog use cases
Fill-In-the-Middle (FIM) code completions
Some people prefer dense in this model size range(Ex: 27B over 35B-A3B). Still no feedbacks from them here.
I know that some people love Granite models. Myself used granite-3.3-8b for simple compact stuffs last year. Their granite-4.0-h-small(30B) came with A9B which's not friendly for Poor GPU Club. Wish it was A3B as it's slower on my 8GB VRAM.
jacek2023@reddit
It's because of the hype. There are very interesting models published by Mistral and NVIDIA and people don't discuss them.
ironwroth@reddit
This sub is heavily astroturfed by Qwen team too.
JsThiago5@reddit
They do good models in all ranges; in a sub where people run models on consumer-grade hardware, it is not that weird to them to dominate
tengo_harambe@reddit
I don't believe it and wouldn't be surprised if the Qwen team has zero reddit presence at all. They have never even bothered to do an AMA like Z.ai and other labs. For free advertising of their product that would have been a no-brainer, especially with astroturfed plants.
Mistral kind of sucks lately and their "small" model is now over 100B. Nvidia's release cadence is spotty and their 30B-A3B nemotron model isn't good compared to Qwen3.5-35A-3B let alone Qwen3.6-27B. Maybe their 100B+ models are more competitive but who has the hardware for that?
pmttyji@reddit (OP)
Can't blame Qwen team. Because they release models almost in all size ranges(0.8B, 2B, 4B, 9B, 27B, 35B, 122B, 397B, 0.6B, 1.7B, 14B, 30B, 32B, 80B, 235B, 480B, ...) .... so fanbase is big.
j0j0n4th4n@reddit
Can you elaborate on he ones you found good?
Knopty@reddit
One thing that surprised me about Granite family is that they have by far the worst multilingual capabilities among modern models. Since 2024 llama3 was probably the last popular model family that had limited language support and any newer releases from Mistral, Qwen, Google only got better and better with each update, improving formerly poorly covered languages like Russian, Ukrainian, Polish and others. Meanwhile Granite just stuck with a few languages with no effort to expand support even after about 1.5 years since Granite 3 release. Nowadays even some TTS models have better language coverage than Granite LLMs.
Howie33@reddit
For my setup, Granite 4.1 30b was the best model for multi-turn agentic use…. until fixes for Gemma 4 and Qwen 3.6 chat templates came out.
Enough-Astronaut9278@reddit
ibm just doesnt do hype marketing so granite flies under the radar here. 30b dense is solid for function calling and extraction though.
DeepWisdomGuy@reddit
The actual radar:
gearcontrol@reddit
Qwen 3.5 4B and 9B that good?
giant3@reddit
Yep. Qwen 3.5 9B @ Q4_K_M is a good compromise between performance and resources like VRAM & GPU/CPU TOPS.
kwizzle@reddit
That chart is specifically for tool use only, not reasoning or general knowledge or anything like that.
whodoneit1@reddit
Maybe he meant by flying under the radar they are at the bottom
seamonn@reddit
They are not even taking off
Healthy-Nebula-3603@reddit
Digging into dirt :)
GCoderDCoder@reddit
My bad... I think this is much more clear. Agreed!
Healthy-Nebula-3603@reddit
Or is just bad is you compare to qwen 3.6 or Gemma 4
GCoderDCoder@reddit
These are some of the models I index on. I had to remove the 30b nemo, 26b gemma4, etc in order to fit granite on the graph.
I do not think you need the biggest brain model for every task. IBM uses that as a crutch for not investing heavily in making better models. Gpt oss 20b is 12gb if I remember correctly at q4 and that's faster and better performing in it's lowest thinking mode than the most recent granite model that just recently was released. Qwen, gemma 4, and Nemo 30b among others would be better choices if needed.
Expensive-Paint-9490@reddit
MiniMax-2.7 and Qwen3.6-27B in my experience are not remotely comparable to Qwen3.5-397B-A17B.
Hydroskeletal@reddit
I tested it - it didn't quite perform up to snuff.
MoE would help a lot.
Embarrassed-Rich3397@reddit
Gemma and Qwen just have better dense models right now especially when qwen 3.6 27b is competing with mini frontier models.
FinalCap2680@reddit
As many around, I have a test prompt I like to use from time to time on new models. The granite 4.1 was quite interesting in its response:
PROMPT:
create a single file classic ----- ***** ----- game using only html, vanilla JS and CSS. control the game by keyboard ----- ***** -----. use detailed 2D graphics.
GRANITE 4.1 UDQ8 ANSWER:
Creating a full-fledged classic ----- ***** ----- game with detailed 2D graphics using only HTML, vanilla JavaScript, and CSS in a single file is quite an extensive task ----- ***** -----. However, I can guide you through creating a simplified version ----- ***** ----- This example won't have detailed 2D graphics but will give you a foundational structure to build upon.
So, to sum up - it is too much work and I'm too lazy, but here are some points and you start working ;)
However, to be honest to the model - the 3 files it provided while very basic and useless, at least rendered without code errors (there were some logic errors) from the first time. For comparison: I gave up on Mistral small 119B after the first try; Nemotron 3 super 120B rendered something at second try, but also useless and could not produce anything working after that; the older Qwens (including 3.5 27B and 122B and coder Next) and Gemma 4 did produce some results, but nothing close to Qwen 3.6. All models were at Q8. All models had logic errors.
PS: I get it, people may be getting sick about all that talk about qwen 3.6: https://www.reddit.com/r/LocalLLaMA/comments/1toxlog/stop_qwenllama_every_other_4th_post_in_this_sub/ but for now that is the reality. When you put a 27B model against 120B model ( https://www.youtube.com/watch?v=H-GtrbcDqYQ ) or even something bigger ( https://www.youtube.com/watch?v=iAIlTC4m8Fw ) and it performs close, that is something....
Jayfree138@reddit
The benchmark comparison isnt looking too good. Link to Bench between qwen, gemma, and granite
Eastern_Bet678@reddit
Overall doesn't appear great but IBM might be driven by internal use cases that aren't reflected in these benchmarks.
mtomas7@reddit
A special selling point of Granite models is that IBM did not use any unlicensed data in the training, which is important for enterprise customers.
Berlodo@reddit
Yes, specially good for internal use cases ... (not sure where I read it but those models were supposed to be very good at handling/converting legacy code cobol, I think, and associated frameworkd and tools and updating to IBM java frameworks )
Longjumping-Sweet818@reddit
Or, what's more likely, IBM is an enterprise circlejerk club that doesn't have the necessary technical expertise or knowhow to go head-to-head with the likes of Google. They make their money by selling overpriced enterprise "solutions" to companies that don't know any better, so naturally they can't produce something of actual worth anymore.
Eastern_Bet678@reddit
No they don't have the depth that premier AI players have. All the more reason to spend your resources carefully and tune to the problems you have.
DeepWisdomGuy@reddit
If they were a small-cap public company, this model would have sunk them.
thedogcow@reddit
I don't know if it's their architecture, or my setup, but the vram usage for Granite goes up with context much more than qwen/gemma/everything, which often pushes it outside my practical window. That being said, Granite guardian is my go to in the governance layer, if that is a thing you care about.
leonbollerup@reddit
Is granite even an option ?
PhotographerUSA@reddit
Qwen3.6-35B-MTP is all you need
Bulky-Priority6824@reddit
I prefer non-MTP for 35b, sir.
PhotographerUSA@reddit
You will get 10 tokens a sec
FinalCap2680@reddit
Better 10 tok/sec of gold than 1000 tok/sec of useless junk (yes, not all of it will be junk, but you will spend more time checking). Non MTP and BF16.
I1lII1l@reddit
you know his hardware?
Bulky-Priority6824@reddit
I could have 50 5090s strewn across an old smokey yellow BINGO table
Bulky-Priority6824@reddit
I don't use sys ram or cpu to load my models, sir.
DeepBlue96@reddit
granite is really bad thats it.
silenceimpaired@reddit
I missed it was released. I value non reasoning models not focused on agentic use as they can work better for creative writing/editing better. I’m curious how this will perform.
Agreeable_System_785@reddit
Thank you for sharing this. I was not aware of these Granite models.
I see that Dutch is supported, so I think I will try it out. I have seen models that mix Dutch with Flemish, which makes them not useable for certain use cases.
Are there particular areas that the Granite models are strong in?
riceinmybelly@reddit
But Dutch and Flemish are the same?
Agreeable_System_785@reddit
Yes and no. The same word used in Belgium can have a different meaning in Dutch. Also Flemish sounds more formal because of the use of 'u'.
We can understand each other and yes, the most notable difference is just the accent in speaking I think.
riceinmybelly@reddit
Poepen
Kahvana@reddit
Ha!
I'm a Dutch native. Flemish is more like a dialect or it's own language depending on the region. I can understand flemish 90% of the time, but some things are quite different.
riceinmybelly@reddit
I’m Flemish and I always thought your news ankers spoke more Flemish than Dutch, but yeah the dialects here are unforgivingly different
GronklyTheSnerd@reddit
It can get weird and complicated. The distinctions between closely related languages and dialects are sometimes extremely unclear. Are people from Glasgow speaking a very difficult to understand dialect of English, or a very closely related language? Hard to say, even for linguists. (Honestly, Flemish may be easier for you than Glasgow English is for most British people!)
Some of my ancestors were Frisian, living in a little town just on the German side of the border. Apparently some of them could also understand, and make themselves understood in Dutch, High and Low German, and even Danish. No record of them dealing with Flemish, but I suspect they could have got by. It makes me a little sad that they seem to be disappearing in Europe, and my family assimilated into America and lost their language a century ago.
Kahvana@reddit
I only tested the model briefly in Dutch, had more success with Gemma4 31B.
It's one of the few models that is ISO (42001) certified. It's also quite lightweight to run; the 3B model can run on netbook specs (Intel Pentium Silver N5000, 8GB DDR4-2400MHz single-channel, Intel UHD Graphics 605. Runs with \~2.5 t/s TG on Vulkan iGPU with llama.cpp).
pmttyji@reddit (OP)
https://huggingface.co/blog/ibm-granite/granite-4-1
HokkaidoNights@reddit
Granite has an interesting looking ecosystem if you look deeper - its on my list to test out. For instance Granite guardian could have some interesting applications depending on use case.
atumblingdandelion@reddit
I’am using their 4.1 3B version for local RAG and its quite good at that
Environmental-Metal9@reddit
This model is a godsend after working in my own training pipeline to take Gemma 4 base and doing a full round of CPT and my own SFT (creative writing + reasoning) on it. This model is like what Llama 5 could have been (very similar basic architecture) and the granite 4.1 base is super efficient to train compared to Gemma 4 (roughly 2.6 more efficient due to the smaller vocab size which may or may not be what one needs; it is what I needed)
Thank you for raising awareness of it. I stopped looking at granite for my own training because the mamba hybrid architecture of the previous model was too difficult for me to figure out how to properly train. (Skill issues on my part!)
whodoneit1@reddit
Maybe he meant by flying under the radar they are at the bottom?
olli-mac-p@reddit
Is you run a business in the EU I believe the Granite models are EU AI act compatible (Gemma might be as well) but I didn't check. But for private people, qwen is the way to go if code writing is the use case.
k_means_clusterfuck@reddit
They are overshadowed because Qwen3.6 27b and Gemma4 31b are just better.
fijasko_ultimate@reddit
imho they are overshadowed bcs of no reasoning
kinda sucks if you re gpu poor and you can load only one model
other than that, i tried 4.0 for some simple summarizatio tasks / tool calls - worked great
i am gonna definately try the new 4.1 series
MomentJolly3535@reddit
i gave it very quick try on a simple prompt (html file where you put 2 different texts and it is supposed to show the difference) it performed extremely poorly, even qwen 9B (reasoning off) performed way better
Maybe my settings were not good.
rawdikrik@reddit
One day, we will learn that benchmarks arent real life.
I've used it and it is decent. Not as good as gemma or qwen 3.6, but the creative writing is decent.
It will fit some workflows.
Techie42@reddit
In my "just starting out with local" and the "let's try every model phase, but on the wrong setup", Granite didn't perform well, so I dropped it. These days I should give it another go. One of those "you don't get a second chance at a first impression" issues.
DoorStuckSickDuck@reddit
Their latest STT is very very noice, especially the plus variant. Great features and works well.
pmttyji@reddit (OP)
u/ibm Comeback with Big Bang! 30-50B Dense & MOE, 100B MOE.
dsartori@reddit
Enthusiasts are generally looking for something different than Granite models offer, I think.