Introducing the IBM Granite 4.1 family of models (3B/8B/30B)
Posted by abkibaarnsit@reddit | LocalLLaMA | View on Reddit | 35 comments
Posted by abkibaarnsit@reddit | LocalLLaMA | View on Reddit | 35 comments
-Akos-@reddit
I used to run the previous granite on my potato laptop with 4GB 1050, at 50k tokens in LM Studio. It worked fine, but on several occasions I found the model extremely stubborn. I made it search the web who was the voice actor for Montgomery Burns in the Simpsons using MCP. It searched the web, came back with an incorrect name, so when I tried correcting the model, it just stuck with the first answer, and no matter how much I made it search (even giving the right name), it said it really was the incorrect name. This wasn't the first time it was so stubborn, so I kind of distrusted the model after that. LFM worked much better for me, even though the general knowledge was worse (and refused to code). I'll for sure check this model out, though I will test the same question as before..
YourNightmar31@reddit
Web search in LM Studio? I thought that was not supported, how do you do that?
Syphari@reddit
A lot of yall don’t realize these are tier 1 for business use cases if you’re a small business getting into AI or want to integrate a safe LLM into your product because from my testing out of all the models the Granite series is far more likely to be friendly, less risky in tone or do dangerous things if jailbroken.
Somehow IBM tops the guardbench leaderboard and Granite Guardian is legit the best separate guardrail model I’ve seen in terms of overall capabilities and performance. Granite LLM with Granite Guardian does ridiculously well on AttaQ adversarial prompts and it’s also ISO certified with cryptographically signed weights which is rare on open models.
I’m making software products and of all the LLMs I’ve evaluated IBM does it overall better in terms of managing risk and still giving solid performance without going overboard.
Anyway that’s my experience as a small software company.
Slasher1738@reddit
The previous granite release was pretty solid in my experience
fijasko_ultimate@reddit
still no reasoning/thinking variants?
Middle_Bullfrog_6173@reddit
Very weak on benchmarks FWIW. 30B scored 15 on AA index. Equal to the non-reasoning scores of Gemma 4 E4B and Qwen 3.5 2B.
IrisColt@reddit
oof.gif
CommunityTough1@reddit
These models are VLM focused specifically for document extraction pipelines.
Beginning-Window-115@reddit
to be fair these are multimodal models not necessarily made to be smart
Middle_Bullfrog_6173@reddit
As far as I can tell these aren't multimodal? Only the separate granite-vision and granite-speech models. Gemma E4B supports both visual and audio input yet manages to be decently smart for its size.
abkibaarnsit@reddit (OP)
Also released : https://huggingface.co/ibm-granite/granite-vision-4.1-4b
Claiming to beat frontier models on Benchmarks
IrisColt@reddit
Hmm....
letsgoiowa@reddit
I like these kind of models with narrowly defined scopes that are really good at those specific things. This model will be awesome for converting paper to digital.
DavidAdamsAuthor@reddit
I too prefer hyper-specialized models that are lightweight but extremely good at one specific thing.
One for OCR, one for translation, even one for specialist things like identifying the tone of an email.
IHave2CatsAnAdBlock@reddit
Which model is specialized on translation ?
DavidAdamsAuthor@reddit
https://www.reddit.com/r/LocalLLaMA/comments/1q1dg3x/i_released_polyglotr2_qwen34b_finetune/
deejeycris@reddit
Due to Goodhart's Law, this can very well be. Doesn't mean it actually beats frontier models.
habachilles@reddit
What is goodharts law
NitinJadhav@reddit
when a measure becomes a target, it ceases to be a good measure
ParthProLegend@reddit
Yes. The most perfect thing I heard from a few years ago. I have always kept it in my mind since then.
iamthewhatt@reddit
As is the case with most local models, benchmaxxing is a feature. I have yet to see a model perform in real-world tasks even close to a frontier model (usually due to local hardware limitations)
Viktor_Cat_U@reddit
so is it still a mamba/transformer model?
Hot_Turnip_3309@reddit
this models are terrible and from indian programmers
ThePrimeClock@reddit
Hoping it will turn all that maths knowledge into a useful Lean assistant.
Great release. I find these models make really good Scrutineers, I use them to cross reference outputs to lift the quality of a leader models work. It's slow switching models but overall much faster to achieve well scoped objectives.
RoomyRoots@reddit
Damn, companies are milking April.
yeah-ok@reddit
https://imgur.com/a/SGRwP1i
Long_comment_san@reddit
is it another 30b dense????? what a day
spencer_kw@reddit
the 8B is interesting as a lightweight fallback for agentic setups. you don't need 30b for every tool call. routers like herma or litellm can send the simple stuff to the 8b and save the 30b for when the task actually needs it. IBM releasing Apache 2.0 helps too, no license games.
Business-Weekend-537@reddit
Which of these will run on a 3090?
ttkciar@reddit
They all should, with suitable quantization. Q4_K_M is recommended.
Business-Weekend-537@reddit
Ty for letting me know
Ps3Dave@reddit
Interesting, I'll try the 8B for sure as it's the only one that can fit on my GPU.
Monad_Maya@reddit
30B might be interesting but I skimmed through the blog
sine120@reddit
Interesting in a few use cases. Glad there's still some competition, hopefully they continue improving.
abkibaarnsit@reddit (OP)
Detailed Blog : https://huggingface.co/blog/ibm-granite/granite-4-1
Hugging Face Collection : https://huggingface.co/collections/ibm-granite/granite-41-language-models