Granite 4.0 Nano Language Models

[-]

thx1138inator@reddit

Members of the Granite team are frequent guests on a public IBM podcast called "Mixture of experts". It's really educational and entertaining!
https://www.ibm.com/think/podcasts/mixture-of-experts

[-]

ibm@reddit

Let us know if you have any questions about these models!

Get more details in our blog → https://ibm.biz/BdbyGk

[-]

jacek2023@reddit

Hello IBM, I have a question - what about bigger models? Like 70B or something :)

[-]

ibm@reddit

Our primary focus is on smaller, efficient, and accessible models, but we are currently training a larger model as part of the Granite 4.0 family.

- Emma, Product Marketing, Granite

[-]

ab2377@reddit

meta could have said the same ..... but they have too much money so they cant really make a small model 🙄

[-]

Particular-Way7271@reddit

If you go with a bigger model, moe pls so I can offload them to cpu pls 😂

[-]

jacek2023@reddit

could you say what is the size of the larger model?

[-]

hello_2221@reddit

For a serious answer, I believe they mentioned a granite 4.0h medium that is 210B-A30B I believe.

[-]

lemon07r@reddit

No, it’s Granite 4 H Large and Granite 4 H Big

Don't ask which one is bigger..

[-]

harrro@reddit

Granite 4 H Grande is the best on my 16xP40

[-]

Could you possible please browbeat your team, or whoever is in charge of the naming to include parameter size in the model names instead of naming things like Tiny and Small.. Or at least meet us half and do both. I'm sure there are other, better ways for the Granite models to be different from the norm or other models than having confusing naming.

[-]

RobotRobotWhatDoUSee@reddit

This IBM developer video says Granite 4 medium will be 120B A30B.

[-]

jacek2023@reddit

Thanks!

[-]

Damakoas@reddit

what is the goal of granite models? Is there a goal that IBM is working towards with the models (like a web browser with embedded granite?)

[-]

celsowm@reddit

✅ How much text in Portuguese was used to train the models?

[-]

coding_workflow@reddit

Is this tuned for tools use? What else we expect?

[-]

ibm@reddit

Yes, the models are optimized for tool and function calling. On the BFCLv3 benchmark measuring tool calling accuracy, the models outperform similar SLMs in their weight class.

In terms of what else you can expect, they are highly competitive on general knowledge, math, code, and instruction following benchmarks and industry-leading on safety benchmarks. When compared to other families like Qwen, LFM, and Gemma, the Granite 4.0 Nano models demonstrate a significant increase in capabilities that can be achieved with a minimal parameter footprint.

Be sure to look into the hybrid architecture. The Mamba-2 blocks let the models scale very efficiently to keep memory usage and latency down.

- Emma, Product Marketing, Granite

[-]

coding_workflow@reddit

I checked it and the 1B plugging in Opencode surprised me. It's not the level of GPT OSS 20B but very impressive for it's size.

128k context amazing.
This can be an intersting base model for fine tuning.

[-]

DecodeBytes@reddit

Hi Emma, what sort of chat template are you using , which trains the models in tool use? If you have any papers of blogs I could read, that would be much appreciated.

[-]

-p-e-w-@reddit

Thank you for pushing non-attention/hybrid architectures forward. You’re the only major player in that space right now, and it’s incredibly important work.

[-]

mpasila@reddit

For bigger models are you guys only gonna train MoE models because the 7B MoE is imo probably worse than the 3B dense model.. so I don't really see a point in using the bigger model. If it was a dense model that probably would have performed better. 1B active params just doesn't seem to be enough. It's been ages since Mistral's Nemo was released and I still don't have anything that replaces that 12B dense model..

[-]

wingwing124@reddit

Hey these are really cool! What does the Granite team envision as some great use cases of these models? What level of workload can they realistically handle?

I'd love to start incorporating these into my daily workflows, and would love to know what I can expect as I am building those out. Thank you for your time!

[-]

0xCODEBABE@reddit

the granite 1B model is closer to 2 billion params?

[-]

ibm@reddit

The core models in the Granite 4.0 family are our hybrid models. For the 1B Nano model, the hybrid variant is a true 1B model. However, for our smaller models we are also releasing non-hybrid variants intended to be compatibility-mode equivalents of the hybrid models for platforms where the hybrid architecture is not yet well supported. For the non-hybrid variant, it is closer to 2B, but we opted to keep the naming aligned to the hybrid variant to make the connection easily visible!

- Emma, Product Marketing, Granite

[-]

VegaKH@reddit

By the size, it looks to be slightly less than 1.5B parameters, so technically we can round it down and call it 1B. Would be a lot more accurate to call it 1.5B.

[-]

kryptkpr@reddit

Do you guys have a reasoning model in the pipeline?

[-]

ibm@reddit

Yes, we are working on thinking counterparts for several of the Granite 4.0 models!

- Emma, Product Marketing, Granite

[-]

pmttyji@reddit

Thanks for these models.

Any plan to release Coder (MOE) model like Granite-4.0-Coder-30B-A3B with bigger context? That would be awesome.

[-]

ironwroth@reddit

Any plans to release Granite 4 versions of the RAG/Security LoRAs that you guys have for Granite 3.3?

[-]

stoppableDissolution@reddit

Only 16 heads :'c

But gonna give it a shot vs old 2b. I hope it will be able to learn to the same level while being 30% smaller.

[-]

AppearanceHeavy6724@reddit

Attention or KV heads?

[-]

stoppableDissolution@reddit

16 attention 4 kv

[-]

Hopeful_Champion4736@reddit

Check: https://huggingface.co/unsloth/granite-4.0-h-1b-GGUF

[-]

triynizzles1@reddit

Will your upcoming vision models be good at providing bounding box coordinates to identify objects in an image?

[-]

FunConversation7257@reddit

Do you know any models which do this well outside of the Gemini family?

[-]

triynizzles1@reddit

Qwen 3 vl appears to be very good at this. We will have to see how it performs once it’s merged in llama cpp

[-]

ibm@reddit

This isn't currently on our roadmap, but we will pass this along to our Research team. Our Granite Docling model offers a similar capability for documents, so it is not out of the realm of possibility for our future vision models.

- Emma, Product Marketing, Granite

[-]

triynizzles1@reddit

That would be amazing to have my employer is hesitant to use non-US AI models (like qwen 3) for this case.

[-]

AppearanceHeavy6724@reddit

there is a granite 3 vlm model too.

[-]

coding_workflow@reddit

I'm impressed by 1M context while using less than 20 GB VRAM ! 1B model here.
Using GGUF from unsloth and surprised they have a model set to 1M and another set 128k.
I will try to push a bit and overload it with data but the 1B punch above it's league. I feel it's suffering a bit in tools use, using generic prompts from Opencode/Openwebui might need some fine tuning here to improve.
@ u/ibm what temperature setting do your recommend as I don't find that in the model card.
Do you recommend VLLM? Any testing validation for GGUF releases?

[-]

-dysangel-@reddit

it's evolving.. just backwards

[-]

Maleficent-Ad5999@reddit

It started from running on data centers to running locally on a smartphone. How is this backwards?

[-]

-dysangel-@reddit

because I don't want to run an efficient 300M model. I want to run an efficient 300B model

[-]

nailizarb@reddit

Sir, this ain't r/datacenterllama

[-]

-dysangel-@reddit

my Mac Studio is not a data center :P

[-]

nailizarb@reddit

That's arguable

[-]

caikenboeing727@reddit

Just wanted to add that the granite team @ IBM is extremely responsive, smart, and frankly just easy to work with. Great for enterprise use cases!

Source : a real enterprise customer who knows this team well, works with them, and appreciates their unique level of openness to engage with enterprise customers.

[-]

nic_key@reddit

This is big if true for 1b model if quality is nice and it gives consistent outputs

Function-calling tasks
Multilingual dialog use cases
Fill-In-the-Middle (FIM) code completions

[-]

Silver_Jaguar_24@reddit

The Granite Tiny is pretty good for use with web search MCP in LM studio, it's my go to for that and it does better than some Qwen models. Haven't tried Nano yet, tempted, maybe I should :)

[-]

letsgoiowa@reddit

Maybe a silly question, but I had no idea you could even do such a thing. How would you set up the model for web search? Is it a perplexity-like experience?

[-]

Silver_Jaguar_24@reddit

Try this - https://github.com/mrkrsl/web-search-mcp?tab=readme-ov-file

I use LM studio to run the LLM. My MCP.json looks like this in LM Studio:

{
  "mcpServers": {
    "web-search": {
      "command": "node",
      "args": [
        "C:\Users\USERNAME\python_scripts\web-search-mcp-v0.3.2\dist\index.js"
      ],
      "env": {
        "MAX_CONTENT_LENGTH": "10000",
        "BROWSER_HEADLESS": "true",
        "MAX_BROWSERS": "3",
        "BROWSER_FALLBACK_THRESHOLD": "3"
      }
    }
  }
}

[-]

ibm@reddit

https://i.redd.it/79by85q8awxf1.gif

[-]

ontorealist@reddit

Better than Qwen in what ways?

I want to use Tiny over Qwen3 4B as my default for web search on iOS, but I still haven’t found a system prompt to make Tiny to format sources consistently just yet.

[-]

Silver_Jaguar_24@reddit

Just structure, quality of the response and the fat that it doesn't fail or take forever to get to the answer.

[-]

stuckinmotion@reddit

Which MCP do you use for web search?

[-]

Silver_Jaguar_24@reddit

Try this - https://github.com/mrkrsl/web-search-mcp?tab=readme-ov-file

[-]

stuckinmotion@reddit

Thanks! I'm still brand new to mcp servers, I'll give that a shot

[-]

skibidimeowsie@reddit

Hi, can the granite team release a comprehensive collection of fine-tuning recipes for these models? Or are these readily compatible with the existing fine-tuning libraries?

[-]

nickguletskii200@reddit

For those struggling with tool calling with Granite models in llama.cpp, it could be this bug (or something else, I am not exactly sure).

[-]

SlowFail2433@reddit

Love the 0.3B (300M) to 0.6B (600M) category

[-]

ibm@reddit

We do too! What do you primarily use models of this size for?

[-]

SlowFail2433@reddit

Personally binary text classification or sometimes routing

[-]

one-wandering-mind@reddit

Is the training recipe and data made public ? How open is open here ?

[-]

ibm@reddit

For our Granite 3.0 family, we released an in-depth paper outlining our thorough training process as well as the complete list of data sources used for training. We are currently working on the same for Granite 4.0, but wanted to get the models out to the community ASAP and follow on with the paper as soon as it’s ready! If you have any specific questions before the paper is out, we can absolutely address them.

- Emma, Product Marketing, Granite

[-]