MIMO V2.5 PRO | TheaterFire

[-]

This is a genuinely massive win for open source, and frankly, for the world. MiMo-V2.5-Pro (and even V2 behind it) genuinely feels like Opus at home. So now, for all those people asking "can I run something as good at Claude locally?" the answer is officially technically yes AND you can use it with an MIT license. It's a straight up incredible model, and being able to have access to something this powerful without corporate interference is a game changer for the kind of projects that can be produced by private groups and individuals without reliance on finicky, ethically shady frontier labs. Fuli Luo and her team are absolute legends.

[-]

ebra95@reddit

I have not used Opus, I have used many others and MiMo-V2.5-Pro is already more than enough for what most people can do with it.
It's awesome.

[-]

Accomplished-Air439@reddit

One nice thing about MiMo V2.5 Pro is that it seems to be the most token efficient open source model. It thinks, but doesn't get into a long "actually wait" cycle all the time. Love it.

[-]

ebra95@reddit

Yes indeed it's fast for a 1T parameter. I guess the infrastructure holds up well.

[-]

power97992@reddit

Maybe like opus 4.5 not 4.6/4.7 or gpt 5.5

[-]

Icy_Butterscotch6661@reddit

Is it really that good? I've tried and it wasn't doing that good with some more obscure things like Win32 API calls in a dotnet console app, or using Ghidra via MCP for some reverse engineering tasks reliably.

While claude models were excellent at these. I don't like anthropic and won't pay for their services, but open models have a long way to go in terms of reaching that level of breadth of knowledge

[-]

TheReedemer69@reddit

Ghidra via MCP for some reverse engineering tasks reliably.

Lets connect.

[-]

real_serviceloom@reddit

Yup it is really good with rust in my case. Better than Kimi 2.6 and GLM 5.1.

[-]

look@reddit

I use GLM-5.1 and Kimi 2.6 for coding plan/build still, but MiMo V2.5 Pro is my favorite model so far for just about everything else. It’s the one I use to figure out what to build, then GLM/Kimi to do the implementation.

[-]

drumyum@reddit

This model (and v2 pro before) is incredible, I've never seen anything like it. I'm not sure what is the secret, but it feels like it had no reinforcement to be "likable" by dumb consumer, it's dry and straight to the point most of the times. It even says "I don't know". But it's kinda meh as a coding agent, I use it mainly to discuss tasks or ideas before coding

[-]

look@reddit

Exactly. MiMo V2 Pro is the model I talk to and work through ideas with, and everything else is effectively just a subagent to it.

[-]

ComplexType568@reddit

I'm extremely happy that each lab is diverging in terms of behavior, Kimi, at least K2 (just 2) was also very MiMo-like, hope they keep that. Gemma is much more Claude like, and Qwen... Is Qwen.

I'm surprised at how fast Xiaomi caught up, tbh. It felt like the original model was clowned on, and then one release later they suddenly dominate. Felt like this phenomenon happens with almost all Chinese labs. Maybe that Yuan lab will come next. Hope xiami releases a smaller (maybe 30-80B) model soon for the GPU poor, though

[-]

jnmi235@reddit

The non-pro model looks strong but needs 4x rtx pro cards in their mixed precision

[-]

silenceimpaired@reddit

I'm excited to try out MiMo-V2-Flash Quant 0.01km tomorrow ;)

[-]

ComplexType568@reddit

Heard the 0 bit quant can run on a pi

[-]

ortegaalfredo@reddit

At this point the Chinese are just rubbing their dicks on Silicon Valley's face. They don't have a SOTA AI model, they have like 10. And all free. And every week they publish new one.

[-]

smith7018@reddit

These models aren't better than Anthropic or OpenAi's models. Yes, they're free and open weight which is a middle finger to them (and amazing for us) but your comment reads like these Chinese models far surpass the West's products but that's really not true

[-]

jld1532@reddit

But that's not how business thinks. It's a weighted model. If the raw intelligence and price are equal to a business then these models do "win". If my business were AI dependent the raw intelligence gap would need to be far, far larger for my to consider paying for tokens at this point.

[-]

KeikakuAccelerator@reddit

You still have to pay for hosting and depending on how many tokens you use, its likely far cheaper to use openai/anthropic instead of hosting the model on your own

[-]

jld1532@reddit

Even if I agreed (I don't, we host both Kimi and Mini, largely due to sensitivity, and it's worth the costs) that statement assumes there won't be massive price hikes. Given the money OpenAI needs to raise (nearly $1 trillion) I think people paying monthly will be experiencing sticker shock soon.

[-]

KeikakuAccelerator@reddit

I think it would be the opposite. Prices will likely come down due to competition, inference optimization, hardware development, architecture progress.

[-]

DeepOrangeSky@reddit

I wonder if a bit of both might happen. As in, the price relative to the intelligence levels of the models will probably continue to improve over time, but on the other hand, the raw price (ignoring intelligence level) will probably go up over time.

So, higher subscription prices or price per token over time, but also significantly stronger models (with the rate of strength increasing faster than the rate of the price hikes) over time.

Well, who knows, but that's my guess, for now.

[-]

jld1532@reddit

I guess we'll see. However, I would point to token based billing as the first price increase just in the form of artificial scarcity.

[-]

KeikakuAccelerator@reddit

It is not really artificial scarcity. Demand is higher than supply.

[-]

StatusSociety2196@reddit

Somebody else did the math and it costs anthropic something like $5000 to service the $20/mo plans.

Inference costs are falling quickly but eventually investors are going to want to get their investment back so id guess maybe $2000/mo for a "software developer replacement" amount of tokens.

Meanwhile, qwen 3.6 is free...

[-]

rhythmdev@reddit

Meanwhile, qwen 3.6 is free...

hell yes.

[-]

NoahFect@reddit

Reddit fuzzes votes

[-]

Acu17y@reddit

They are

[-]

TechSwag@reddit

You gotta be a bot

or illiterate.

[-]

Acu17y@reddit

Have you ever tried kimi 2.6 ? It costs 1/20 of the Opus 4.7 and is on par

[-]

ortegaalfredo@reddit

If you think it's on par, then you don't give it hard enough problems. Opus is far better but then again, not everybody needs it.

[-]

Acu17y@reddit

It is, and my project is very big. I don't take money from anyone to say it.

[-]

Dabalam@reddit

I do wonder what the US will do besides "banning Chinese models". Even closing the gap is an existential threat to the big companies and you can't have all the dominant models produced by a single country.

[-]

ebra95@reddit

they are going to restrict access to higher models to their inside circles of companies, limiting consumer end-user from direct access, pushing through various vendors i guess

[-]

ebra95@reddit

they are behind for very little given the timelapse between startups and the cost difference make it worth.
plus we are almost at the finish line of what large language models are capable, so if they get to offer the same quality, who is going to pay 10x the cost ?

[-]

RedParaglider@reddit

They don't have to be better, they just have to be acceptable to set a ceiling on the profitability of AI inference providers. I personally use open models that are hosted at inference providers for all of my enterprise workflows. Not all of my employees do, but right now I don't care. In the future I probably will.

[-]

Jackw78@reddit

Can you imagine how much better Chinese models woud be if they are able to buy the latest EUV systems or even just the Nvidia chips without worrying about sanctions? EUV is basically the combined efforts of US, EU, Japan and SK and China has to tackle it all by itself.

[-]

ortegaalfredo@reddit

It's true, but they are at max, 2 months behind them. Thats not enough of a moat.

[-]

nunodonato@reddit

although you are not wrong, you miss the main point. The models aren't better, but they are close, and they are definitely good enough for the big majority of cases. More, they are catching up, the gap keeps closing and getting narrower every month

[-]

Endoky@reddit

I used all the models on large and complicated code bases and no they are not SOTA. They can compete because of the price but GPT5.5 blows them out of the water in performance

[-]

rhythmdev@reddit

5.5 is what... 10 days old? Give it a couple months.

[-]

jld1532@reddit

They're trying to collapse the AI market and given the attack on American workers by the country's own developers it's hard for me to not cheer for them a bit.

[-]

Monkey_1505@reddit

For many companies like google, meta, xiaomi who make plenty of money in other areas, it makes business sense. Their moat is they can afford to not make much money from it and have it commoditized. The ai super labs cannot.

[-]

ortegaalfredo@reddit

I don't think they are saints or heroes, once they succeed and collapse the competition, they will back at charging lots of money. They do this only because make business sense.

[-]

throw_me_away3478@reddit

Its cool because unless the US restricts access to Chinese models, Anthropic and Open AI will have a hard time attracting customers to their paid plans.

Another win for open source models

[-]

Chris266@reddit

I wouldnt say Anthropic and OpenAI are having a hard time attracting people to their plans...

[-]

ortegaalfredo@reddit

I believe this work as a hard limit on how much they can charge. Weren't for chinese models, OpenAI could easily charge thousands for their regular models.

[-]

xienze@reddit

OpenAI could easily charge thousands for their regular models.

The existence of Chinese models is not what keeps the plans so cheap right now. They're trying to rapidly gain marketshare. Chinese models or not it was always going to be cheap. Initially...

[-]

Um0therfckers@reddit

Due cheap powers and cheaper labor, plus goverment incentivies, the initial cost is already way cheaper to Western rivals. You need to take this into account as well...

[-]

Chris266@reddit

I think the vast majority of people who use AI (I don't mean power users) have no idea that a Chinese model even exists. Or even what a model is for that matter.

In my case, my company knows what they are but still thinks Chinese = Bad.

[-]

jld1532@reddit

Universities are using open weights models. These ecosystems are actively being incorporated into higher education. It's only a matter of time.

[-]

mestresamba@reddit

People would think the same before opus 4.5. Look the current state. Masses move a little behind, but since the masses here are devs, they don’t move so slowly or have fidelity. Everyone will make the switch eventually.

[-]

xrvz@reddit

Something's bound to implode and such companies will learn the hard way.

[-]

PinkySwearNotABot@reddit

You mean another win for us. Think about what America would do to us consumers if China and their open models weren’t there to give us leverage.

[-]

ortegaalfredo@reddit

Its not really open-source, more like open-weights, and you cannot truly recreate the tech, just use it for free.

[-]

ebra95@reddit

But you can fine tune it, re-training from scratch would cost hundreds of thousands of dollars at least, just to re-create the same result again.
It is a huge leap to be able to spare that and run only a couple of thousands maybe and upgrade it for your specific nieche of work, run it on your stack as you please.

Now do this with ChatGpt : you have oss20b and 120b which are great for a.. customer support job ?

[-]

kevinlch@reddit

most people don't care as long as it's free

[-]

bbjurn@reddit

Still much better than what most US companies are providing.

[-]

More-Curious816@reddit

Even if they did that, Europe and other first world countries would use the on par, cheaper, Chinese services over the absurd prices the American providers demands. Maybe even host their own.

[-]

RedParaglider@reddit

What it's really doing is basically saying that the cap on what inference providers can charge is not based on hardware+markup, it's just hardware.

[-]

Budget-Juggernaut-68@reddit

It's actually working pretty well based on deepseek V4's recent report.

[-]

rageling@reddit

They don't have like 10, they have like 0, GPT 5.5 and Opus 4.7 are SOTA, China is offering nothing of comparable quality

[-]

ortegaalfredo@reddit

Those models are 2 weeks old. They have about 2-3 month advantage, no more.

[-]

Icy_Butterscotch6661@reddit

You think you are talking to someone who woke up a loser?!

[-]

Turbulent_Pin7635@reddit

AI soon will be like ping pong, it is easier to win the world championship than win the Chinese championship

[-]

ComplexType568@reddit

oh my goodness another 1T baby. i want to see how GLM 5.X, Kimi K2.X (or 3) and DeepSeek V4 all play out...

[-]

ilintar@reddit

Gonna try it out as soon as my cat gives back the stack of 8 RTX 6000s he's hidden somewhere.

[-]

pmttyji@reddit

Attract the kitty with 8 Treats

[-]

srigi@reddit

Cat attracts him with 8x RTXs

[-]

ChocomelP@reddit

Get pussy with CUDA?

[-]

Weak_Kaleidoscope839@reddit

And a pspspspspsps

[-]

UpAndDownArrows@reddit

I mean come on people. It's a MoE with 49B active params. A 5090 paired with 3090 can fit the active experts, and the rest can live in RAM.

[-]

ilintar@reddit

Yeah, cat also ate 1GB worth of DDR5 chips, mistook them for dry food.

[-]

coder543@reddit

1 gigabyte? In this economy?

[-]

ilintar@reddit

Sorry, meant 1TB obviously, cat only eats 64+ GB DDR5 wafers, he's pretty picky.

[-]

blackbird2150@reddit

64gb, nothing less!

[-]

czktcx@reddit

Active params are used to evaluate performance, not for VRAM usage, RAM offloading to putting almost all weights in RAM, VRAM is mainly for kvcache, so techinically you don't even need 2 cards.

But where's my 512G RAM?

[-]

UpAndDownArrows@reddit

I mean I get it, 512 GB RAM is not laying on a sidewalk, but like...

Compare getting 512 GB RAM vs getting 8 RTX 6000s, the difference is just an order of magnitude or more.

[-]

grumd@reddit

Wake me up when V2.5-Flash drops

[-]

Maximum@reddit

You meant the REAP-ed 1bit quants of the flash

[-]

ambient_temp_xeno@reddit

Good news, bad news situation:

https://huggingface.co/XiaomiMiMo/MiMo-V2.5

[-]

pkmxtw@reddit

Why are those labs capable of training multi million dollar models and yet are so terrible at making charts lol

[-]

smallDeltaBigEffect@reddit

you should read some papers of the smartest people currently alive.

If it wasnt for some design affine phd students, you would get handdrawn powerpoint slides

[-]

zdy132@reddit

Yann Lecun gave out some amazing talks available online.

And it's very clear dude has never bothered to change his ppt template for years. It even looks like he's just editing texts and replacing images on the same ppt file.

[-]

Dany0@reddit

That skill is not transferrable. The could train a model to make charts for them, but they cba

[-]

a_beautiful_rhind@reddit

That's the one because I didn't buy enough ram, lol.

[-]

segmond@reddit

Apple needs to hurry the fuck up and let us know the spec for m5 ultra stdio so we can decide if we are buying it or a horde of Nvidia GPUs.

[-]

power97992@reddit

It should be already by end of may or june

[-]

rhythmdev@reddit

me waiting a 100b a15b or 200b a24b or a similar model which never comes... it is either 35b/27b or 1t a50b model. They are trying hard to force 5090 rig users to 6000 pros

[-]

SnooPaintings8639@reddit

But the smaller version is 300b A15, so close enough. Go lower with a quant and have fun. With only 15b active params it should be usable.

[-]

rhythmdev@reddit

Y either that or deepseek flash, i'll try my chances with those but i dont have high hopes. Or wait for a distilled smaller version

[-]

GreedyWorking1499@reddit

What is “F8_E4M3”?

[-]

fantasticsid@reddit

An FP8 format with 4 exponent bits, 3 mantissa bits, and a sign bit.

[-]

GreedyWorking1499@reddit

Is that how all FP8 is?

[-]

fantasticsid@reddit

Nah, there's also E5M2 - which trades range for precision - and a bunch of specialised stuff like MXFP8 (same trick as MXFP4, just twice as wide.)

[-]

GreedyWorking1499@reddit

How does that work if a weight is big enough to be stored in E5M2 but too big for E4M3? What is the range of weights in actual GGUFs?

[-]

fantasticsid@reddit

No idea tbh - I assume the weights in question don't exceed the range of E4M3 so they used that format for the extra precision it provides?

[-]

GreedyWorking1499@reddit

Interesting. I wonder why people would use E5M2 then

[-]

SnooPaintings8639@reddit

Image, video, audio, long context... the smaller version of 310B A15 might end up being a MiniMax 2.5 replacement.

[-]

ThePixelHunter@reddit

Should be about 140B in Q3. Can't wait to run it.

[-]

Dramatic-Rub-7654@reddit

1-bit gguf when?

[-]

LegacyRemaster@reddit

Another 2x rtx 6000 needed

[-]

RedParaglider@reddit

YOU MUST CONSTRUCT ADDITIONAL PYLONS

[-]

onewheeldoin200@reddit

😂

[-]

datbackup@reddit

WE REQUIRE MORE VIDEO RAM

[-]

mindwip@reddit

Omg yes

[-]

LegacyRemaster@reddit

for Adune

[-]

Dany0@reddit

Why do I get spam enticing me with women. Entice me with H200s, damn it!

[-]

Eyelbee@reddit

More like 16x

[-]

LegacyRemaster@reddit

q4 :D

[-]

SeaDisk6624@reddit

8 for nvfp4

[-]

LegacyRemaster@reddit

yes but I have already 96+48+48gb vram + 128gb ram :D

[-]

lendo93@reddit

Most underrated model of this release cycle. We have it performing better than DeepSeek and comparable to Kimi 2.6, without the rate limits.

[-]

nullmove@reddit

Can't run either. At least this was funny: https://xcancel.com/XiaomiMiMo/status/2048803550562844727#m

[-]

adumdumonreddit@reddit

I know it probably isn’t but it would be so funny if one guy was uploading the weights and running the Twitter page from the same machine frantically tabbing between the ssh session and the browser to live tweet it

[-]

RnRau@reddit

They used QAT for the fp8 weights?

I guess this snippet from the model card suggest that is the case - "Trained on 27T tokens using FP8 mixed precision"

[-]

jake_schurch@reddit

Fantastic that my hardware is only off by 1 order of magnitude instead of multiple

[-]

kyleboddy@reddit

Mimo-v2.5-Pro benchmarks extremely well on biomech-bench that we launched today though it consumes a ton of tokens comparatively.

https://x.com/drivelinekyle/status/2048775424621396472

[-]

Zestyclose-Ad-6147@reddit

Whoo! 🙌

[-]

jzn21@reddit

These new MiMo V2.5 (both pro and non-pro) perform extremely strongly in my own benchmarks and are heavily underrated. This is truly a huge gain for the open-source community!

[-]

funding__secured@reddit

Can't run on my RTX 6000 Pros yet 😞 Needs FlashAttention 3.

[-]

vinigrae@reddit

No way they open sourced that

[-]

JC1DA@reddit

Is this better than Qwen-3.5-397B, it's smaller but it lacks of vision capability

[-]

pigeon57434@reddit

it does have vision? what the fuck are you talking about it also has audio input too

[-]

pigeon57434@reddit

im surprised they open sourced the pro version i just though they would do regular 2.5 thats very nice

and it seems this may be the new most capable oss base in the world and possibly most capable model period but K2.6 might still edge it out for me but it still uses K2-base so you cant build off it as easily if thats something you do

[-]

maxpayne07@reddit

Just made an linux server app with GUI with mimo pro2 and opencode. Spent 8 euros in code- fast, smart, fast debug. Happy, maybe i will try deepseek, is even less expensive.

[-]

_hephaestus@reddit

How many mac studios do I need to run this

[-]

myreala@reddit

Just one 512gb would work for the v.2.5, maybe 3 for pro version.

[-]

Kiedrola@reddit

Larga vida a China! Gracias!

[-]

jochenboele@reddit

Already have been testing the pro version for a few days. Can’t wait until they opensource it

[-]

coder543@reddit

Can’t wait until they opensource it

What do you think this post is that you're commenting on?

[-]