For Non-hallucinating work, MiMo 2.5 delivers

Posted by Beamsters@reddit | LocalLLaMA | View on Reddit | 19 comments

MIT license and fully open source. MiMo-V2.5-Pro was just 3 points from Opus 4.7 max and the normal V2.5 is only a step behind SOTA. But both produce 75% and 68% non-hallucination rate. Best intel/hallucination model yet.

V2.5 FP8 is like 316GB, you *might* be able to run a tight 3 bit quant with 128gb m5 max.

From Gemma to Qwen3.6 to Kimi2.6 to Deepseek v4 to MiMo2.5, this probably is the best April.

[-]

InteractionSmall6778@reddit

The 75% non-hallucination rate is the headline, but the real story is what that means for retrieval and tool use in agents - models that reliably don't confabulate references unlock use cases that were too risky with most frontier models.

The 3-bit quant path for 128GB M5 Max will be worth watching.

[-]

Glittering-Call8746@reddit

Turbo3 right ?

[-]

Beamsters@reddit (OP)

Maybe also good for retrieving facts, grade chapters and summarize characters from books.

[-]

Specter_Origin@reddit

How is the token efficiency? when the released it initially they were heavily emphasizing how token efficiant the model is.

[-]

coder543@reddit

Token efficiency seems quite good.

[-]

nuclearbananana@reddit

That's not respectable, that's worse than k2.6

[-]

coder543@reddit

Huh? You want less tokens, not more. K2.6 is using twice as many tokens. mimo-v2.5 is much more efficient.

[-]

nuclearbananana@reddit

Oh I thought OP was asking about ds v4. Nvm

[-]

coder543@reddit

Also:

[-]

Beamsters@reddit (OP)

They're pretty good I'd say, only reasoning models were selected here.

[-]

zdy132@reddit

Another interesting thing in the second graph is how bad the DeepSeek V4 models are doing. Are they particularly prone to hallucination?

[-]

Technical-Earth-3254@reddit

DS models were always prone to hallucinate. V4 is still in preview, keep that in mind (but I doubt it will surpass V3.2). Mimo is for sure completely out of reach.

[-]

Kodix@reddit

Yep. Excellent for creative writing (really impressed me, tried similar Polish language prompt on several models and Deepseek was by far the best), but kinda awful for structured work. Mimo 2.5 flagged *so many* issues that Deepseek introduced to a project that I just dropped it from consideration.

Deepseek flash is amazingly cheap per token for the quality, though.

[-]

pigeon57434@reddit

in general im massively dissapointed in deepseek v4 but my coping mechanism tells me "they said it was only preview" the likely cause is literally and unfortunately just pretraining tokens was almost zero compared other trillion scale oss models

[-]

Asleep-Dot5479@reddit

had it happen a few times already. Even when asked if they're sure, they insist and invent proof

[-]

sammybeta@reddit

I tried to use deepseekv4 pro first night, was not impressed. mimo2.5 is significantly better and efficient.

Now with that huge 90% off discount with deepseek, I can tolerate on it using a bit more tokens to get things right slowly.

[-]

zdy132@reddit

Same, I tried it a bit and it was meh. Not great, not terrible.

But at less than one quarter of the price than Mimo 2.5, it will stay my main agent while the discount is live.

[-]

EmotionalLock6844@reddit

I've been testing 2.5 pro as orchestrator and i can tell you, that its at least 2x better than gpt 5.5 on that. Its insanely efficient and smart at parallel subagent orchestration. Constantly running 5-8 parallel lanes in a single project, parallel worktrees with no issues. Almost flawless at merging worktrunks to main, solving conflicts. Im totally impressed!

[-]

ghgi_@reddit

Mimo is my favorate chinese model recently, even nicer then qwen, kimi and deepseek, It checks nearly all the boxes besides coding perf isnt as good as claude or gpt which is fine for 99% of tasks that arent hardcore projects, It can work very well along side other models either as a helper or a assistant and ive had good results with it being an agent and doing automated tasks.