The official Minimax API is free until 11/07, it's making a big difference in code quality and speed compared to Open Router, it's also more stable for long-running tasks, I did a lot of testing today and it performed better than GLM 4.6, it still doesn't compare to GPT 5 Codex high or Sonnet 4.5, but the other AIs I've already tested, especially the open source ones, in my opinion I put them in my pocket, I used them in several tasks a little more complex to debug due to the size of the code base and did well, especially in tool calls.
Superior Intelligence. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score ranks #1 among open-source models globally.
Advanced Coding. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages.
Agent Performance. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps.
Efficient Design. With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks.
"Its composite score ranks #1 among open-source models globally" are we that blind?
it failed on majority of simple debugging cases for my project and I don't find it as good as it's benchmark score somehow through? GLM 4.5 air or heck even qwen coder REAP performed much better for my debugging use case
Hi, I'm engineer from MiniMax. May I know which endpoint did you use. There's some problem with openrouter's endpoint for M2, we still working with them.
We recommend you to use M2 in Anthropic Endpoint, with tool like Claude Code. You can grab an API Key from our offical API endpoint and use M2 for free. https://platform.minimax.io/docs/guides/text-ai-coding-tools
"IMPORTANT: MiniMax-M2 is an interleaved thinking model. Therefore, when using it, it is important to retain the thinking content from the assistant's turns within the historical messages. In the model's output content, we use the ... format to wrap the assistant's thinking content. When using the model, you must ensure that the historical content is passed back in its original format. Do not remove the ... part, otherwise, the model's performance will be negatively affected"
Depends on if it thinks a lot. But the bigger problem I think is that most coding agents are built to strip those (at least the one at the very beginning because interleaved thinking isn't very common).
From my testing, the output of minimax m2 with thinking looks a lot worse than Claude 4.5 sonnet no thinking and deepseek 3.2 no thinking , worse than free gpt 5 thinking low.
It is slightly worse than gemini flash with 10k token thinking and qwen 3 vl 32 b with no thinking .
It is better than glm 4.5 air thinking as the code actually displays something.
It is about on par with glm 4.6 thinking on this one task…
It is better than qwen 3 next 80b thinking
Yeah, but im not gonna spend hours testing it against 7-8 models on various tasks. Max maybe 2 or 3 tasks against 3-4 models.. testing one task against various models already took like an hour
For my usecase(writing a comfyui custom node) sonnet 4.5 last night could not solve issue after i finished my budget of like 20 prompt.
But minimax solved it on first try so it depends to the task i think.
Sometimes a model can solve an issue sometimes it dont. And in those times you better to get a second opinion.
Until now i am happy with minimax m2
it is a bit strange, the model says it is BF16 and when I have looked at what it occupies, it is the equivalent of FP8. I have set it to download as it fits me in 4 RTX 6000 pro.
It seems M2 abandoned the fancy linear lightning attention, and opted for a traditional arch. Usually that's a big hurdle and indeed the reason earlier Minimax models weren't supported.
We finally have one model that default respond in chinese while i am speaking english, that totally my needed. That is a big W for chinese developer, and such impressive quality on a a 230b-a10b moe model.
Tectorumiris@reddit
Anybody knows how to make minimax m2 to have structured output like json format?
zenmagnets@reddit
MiniMax M2 getting lots of news, but from my tests it's worse than Qwen3 coder 30b. Maybe the free version on openrouter is dumbed down or something?
vandesa003@reddit
a dumb question, i wonder why they didn't include Qwen3 in their benchmark?
Kamal965@reddit
Yeah, the openrouter one has issues, apparently. See the Minimax Engineer's post here: Link
Thin_Yoghurt_6483@reddit
The official Minimax API is free until 11/07, it's making a big difference in code quality and speed compared to Open Router, it's also more stable for long-running tasks, I did a lot of testing today and it performed better than GLM 4.6, it still doesn't compare to GPT 5 Codex high or Sonnet 4.5, but the other AIs I've already tested, especially the open source ones, in my opinion I put them in my pocket, I used them in several tasks a little more complex to debug due to the size of the code base and did well, especially in tool calls.
sudochmod@reddit
I hope we get to see a reaped version :D
Dark_Fire_12@reddit (OP)
Highlights
Superior Intelligence. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score ranks #1 among open-source models globally.
Advanced Coding. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages.
Agent Performance. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps.
Efficient Design. With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks.
idkwhattochoo@reddit
"Its composite score ranks #1 among open-source models globally" are we that blind?
it failed on majority of simple debugging cases for my project and I don't find it as good as it's benchmark score somehow through? GLM 4.5 air or heck even qwen coder REAP performed much better for my debugging use case
Simple_Split5074@reddit
I found it to be rather good a bug fixing python in roo code, likely better than full GLM 4.6
Apart-River475@reddit
I found it really bad in my task
this_is_a_long_nickn@reddit
Care to share more details? E.g., language, project size, task type, etc. you known the drill :-)
idkwhattochoo@reddit
Rust and Golang; I use crush cli
Finanzamt_kommt@reddit
Might be wrong implementation by provider?
Such_Advantage_6949@reddit
Or the model could simply be benchmaxing
Finanzamt_kommt@reddit
Might be but all benchmarks at once?
joninco@reddit
Educational_Sun_8813@reddit
just checked yesterday REAP for glm-4.5-air and it works pretty well
OccasionNo6699@reddit
Hi, I'm engineer from MiniMax. May I know which endpoint did you use. There's some problem with openrouter's endpoint for M2, we still working with them.
We recommend you to use M2 in Anthropic Endpoint, with tool like Claude Code. You can grab an API Key from our offical API endpoint and use M2 for free.
https://platform.minimax.io/docs/guides/text-ai-coding-tools
SilentLennie@reddit
Looking at how it's working, you folks seem to have made a pretty complete system. The model and the chat system at https://agent.minimax.io/
The model is testing the script I asked for to see what mistakes it made and automatically fixes it.
I think the model might be worse than some, but as part of the complete solution it is working.
nullmove@reddit
Will there be a technical report?
Worthstream@reddit
What do you mean for free? What are the limits?
idkwhattochoo@reddit
Thank you for the response, indeed I was using openrouter endpoint; I'll use official API endpoint then
pmttyji@reddit
Hi, Any small/medium models(MOE would be awesome) coming? Thanks
Baldur-Norddahl@reddit
Maybe you were having this problem?
"IMPORTANT: MiniMax-M2 is an interleaved thinking model. Therefore, when using it, it is important to retain the thinking content from the assistant's turns within the historical messages. In the model's output content, we use the... format to wrap the assistant's thinking content. When using the model, you must ensure that the historical content is passed back in its original format. Do not remove the ... part, otherwise, the model's performance will be negatively affected"
Arli_AI@reddit
Wow that sounds like it'll use a lot of the context window real quick.
nullmove@reddit
Depends on if it thinks a lot. But the bigger problem I think is that most coding agents are built to strip those (at least the one at the very beginning because interleaved thinking isn't very common).
Arli_AI@reddit
That's easily solved with a few lines of code changes really the issue would be the inflation of context size.
idkwhattochoo@reddit
I used openrouter instead of running it locally; I assume it's better on their official API endpoint
Mike_mi@reddit
Tried it on open router wasn't even able to do proper tool calling, from their api works like a charm with CC
Baldur-Norddahl@reddit
The quoted problem is something your coding agent would have to handle. It is not the usual way, so it is very likely doing it wrong.
Recent-Success-1520@reddit
I tried it with Openrouter today and it fixed issues what GLM couldn't in even 6 tries.
power97992@reddit
From my testing, the output of minimax m2 with thinking looks a lot worse than Claude 4.5 sonnet no thinking and deepseek 3.2 no thinking , worse than free gpt 5 thinking low. It is slightly worse than gemini flash with 10k token thinking and qwen 3 vl 32 b with no thinking . It is better than glm 4.5 air thinking as the code actually displays something. It is about on par with glm 4.6 thinking on this one task… It is better than qwen 3 next 80b thinking
_yustaguy_@reddit
oh, so it tells us pretty much nothing
power97992@reddit
Yeah, but im not gonna spend hours testing it against 7-8 models on various tasks. Max maybe 2 or 3 tasks against 3-4 models.. testing one task against various models already took like an hour
lumos675@reddit
For my usecase(writing a comfyui custom node) sonnet 4.5 last night could not solve issue after i finished my budget of like 20 prompt. But minimax solved it on first try so it depends to the task i think. Sometimes a model can solve an issue sometimes it dont. And in those times you better to get a second opinion. Until now i am happy with minimax m2
celsowm@reddit
I hope they release Minimax Text 02 too, t1 was the best open one in my Brazilian Legal Benchmark
Rascazzione@reddit
it is a bit strange, the model says it is BF16 and when I have looked at what it occupies, it is the equivalent of FP8. I have set it to download as it fits me in 4 RTX 6000 pro.
Has anyone else noticed this?
WonderRico@reddit
it is fp8. it was actually trained in fp8 : https://huggingface.co/MiniMaxAI/MiniMax-M2/discussions/14#68ff9a39550682ab5ea04a98
No_Conversation9561@reddit
Guys whoever will be working on this on llama.cpp, please put your tip jar in your github profile
ilintar@reddit
This looks like a very typical model, its only quirk is that it's pre-quantized in FP8. Fortunately, compilade just dropped this in llama.cpp:
https://github.com/ggml-org/llama.cpp/pull/14810
ilintar@reddit
In fact, I think in case of this model the bigger (harder) part to implement will be its chat template, i.e. the "interleaved thinking" part.
nullmove@reddit
It seems M2 abandoned the fancy linear lightning attention, and opted for a traditional arch. Usually that's a big hurdle and indeed the reason earlier Minimax models weren't supported.
Ali007h@reddit
I don't know how A10B is this good in benchmarks🤷
SilentLennie@reddit
sadly benchmarks are just benchmarks
Guardian-Spirit@reddit
So... What CLI tool for agentic coding is supposed to be used then, if it's interleaved thinking?
RuthlessCriticismAll@reddit
https://platform.minimax.io/docs/guides/text-ai-coding-tools
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
TransitionSlight2860@reddit
chinese version Haiku
Leflakk@reddit
I hope the model is as good as in benchmark (once common support issues at new model launch are solved). Thanks guys for your amazing work!
bobeeeeeeeee8964@reddit
We finally have one model that default respond in chinese while i am speaking english, that totally my needed. That is a big W for chinese developer, and such impressive quality on a a 230b-a10b moe model.
bobeeeeeeeee8964@reddit
I can't wait for awq for this monster.
Dark_Fire_12@reddit (OP)