Introducing the IBM Granite 4.1 family of models (3B/8B/30B)

[-]

-Akos-@reddit

I used to run the previous granite on my potato laptop with 4GB 1050, at 50k tokens in LM Studio. It worked fine, but on several occasions I found the model extremely stubborn. I made it search the web who was the voice actor for Montgomery Burns in the Simpsons using MCP. It searched the web, came back with an incorrect name, so when I tried correcting the model, it just stuck with the first answer, and no matter how much I made it search (even giving the right name), it said it really was the incorrect name. This wasn't the first time it was so stubborn, so I kind of distrusted the model after that. LFM worked much better for me, even though the general knowledge was worse (and refused to code). I'll for sure check this model out, though I will test the same question as before..

[-]

YourNightmar31@reddit

Web search in LM Studio? I thought that was not supported, how do you do that?

[-]

Syphari@reddit

A lot of yall don’t realize these are tier 1 for business use cases if you’re a small business getting into AI or want to integrate a safe LLM into your product because from my testing out of all the models the Granite series is far more likely to be friendly, less risky in tone or do dangerous things if jailbroken.

Somehow IBM tops the guardbench leaderboard and Granite Guardian is legit the best separate guardrail model I’ve seen in terms of overall capabilities and performance. Granite LLM with Granite Guardian does ridiculously well on AttaQ adversarial prompts and it’s also ISO certified with cryptographically signed weights which is rare on open models.

I’m making software products and of all the LLMs I’ve evaluated IBM does it overall better in terms of managing risk and still giving solid performance without going overboard.

Anyway that’s my experience as a small software company.

[-]

Slasher1738@reddit

The previous granite release was pretty solid in my experience

[-]

fijasko_ultimate@reddit

still no reasoning/thinking variants?

[-]

Middle_Bullfrog_6173@reddit

Very weak on benchmarks FWIW. 30B scored 15 on AA index. Equal to the non-reasoning scores of Gemma 4 E4B and Qwen 3.5 2B.

[-]

IrisColt@reddit

oof.gif

[-]

CommunityTough1@reddit

These models are VLM focused specifically for document extraction pipelines.

[-]

Beginning-Window-115@reddit

to be fair these are multimodal models not necessarily made to be smart

[-]

Middle_Bullfrog_6173@reddit

As far as I can tell these aren't multimodal? Only the separate granite-vision and granite-speech models. Gemma E4B supports both visual and audio input yet manages to be decently smart for its size.

[-]

abkibaarnsit@reddit (OP)

Also released : https://huggingface.co/ibm-granite/granite-vision-4.1-4b

Claiming to beat frontier models on Benchmarks

[-]

IrisColt@reddit

Hmm....

[-]

letsgoiowa@reddit

I like these kind of models with narrowly defined scopes that are really good at those specific things. This model will be awesome for converting paper to digital.

[-]

DavidAdamsAuthor@reddit

I too prefer hyper-specialized models that are lightweight but extremely good at one specific thing.

One for OCR, one for translation, even one for specialist things like identifying the tone of an email.

[-]

IHave2CatsAnAdBlock@reddit

Which model is specialized on translation ?

[-]

DavidAdamsAuthor@reddit

https://www.reddit.com/r/LocalLLaMA/comments/1q1dg3x/i_released_polyglotr2_qwen34b_finetune/

[-]

deejeycris@reddit

Due to Goodhart's Law, this can very well be. Doesn't mean it actually beats frontier models.

[-]

habachilles@reddit

What is goodharts law

[-]

NitinJadhav@reddit

when a measure becomes a target, it ceases to be a good measure

[-]

ParthProLegend@reddit

Yes. The most perfect thing I heard from a few years ago. I have always kept it in my mind since then.

[-]

iamthewhatt@reddit

As is the case with most local models, benchmaxxing is a feature. I have yet to see a model perform in real-world tasks even close to a frontier model (usually due to local hardware limitations)

[-]

Viktor_Cat_U@reddit

so is it still a mamba/transformer model?

[-]

Hot_Turnip_3309@reddit

this models are terrible and from indian programmers

[-]

ThePrimeClock@reddit

Hoping it will turn all that maths knowledge into a useful Lean assistant.

Great release. I find these models make really good Scrutineers, I use them to cross reference outputs to lift the quality of a leader models work. It's slow switching models but overall much faster to achieve well scoped objectives.

[-]

RoomyRoots@reddit

Damn, companies are milking April.

[-]

yeah-ok@reddit

https://imgur.com/a/SGRwP1i

[-]

Long_comment_san@reddit

is it another 30b dense????? what a day

[-]

spencer_kw@reddit

the 8B is interesting as a lightweight fallback for agentic setups. you don't need 30b for every tool call. routers like herma or litellm can send the simple stuff to the 8b and save the 30b for when the task actually needs it. IBM releasing Apache 2.0 helps too, no license games.

[-]

abkibaarnsit@reddit (OP)

Detailed Blog : https://huggingface.co/blog/ibm-granite/granite-4-1

Hugging Face Collection : https://huggingface.co/collections/ibm-granite/granite-41-language-models