[-]

Tectorumiris@reddit

Anybody knows how to make minimax m2 to have structured output like json format?

[-]

zenmagnets@reddit

MiniMax M2 getting lots of news, but from my tests it's worse than Qwen3 coder 30b. Maybe the free version on openrouter is dumbed down or something?

[-]

vandesa003@reddit

a dumb question, i wonder why they didn't include Qwen3 in their benchmark?

[-]

Kamal965@reddit

Yeah, the openrouter one has issues, apparently. See the Minimax Engineer's post here: Link

[-]

The official Minimax API is free until 11/07, it's making a big difference in code quality and speed compared to Open Router, it's also more stable for long-running tasks, I did a lot of testing today and it performed better than GLM 4.6, it still doesn't compare to GPT 5 Codex high or Sonnet 4.5, but the other AIs I've already tested, especially the open source ones, in my opinion I put them in my pocket, I used them in several tasks a little more complex to debug due to the size of the code base and did well, especially in tool calls.

[-]

sudochmod@reddit

I hope we get to see a reaped version :D

[-]

Dark_Fire_12@reddit (OP)

Highlights

Superior Intelligence. According to benchmarks from Artificial Analysis, MiniMax-M2 demonstrates highly competitive general intelligence across mathematics, science, instruction following, coding, and agentic tool use. Its composite score ranks #1 among open-source models globally.

Advanced Coding. Engineered for end-to-end developer workflows, MiniMax-M2 excels at multi-file edits, coding-run-fix loops, and test-validated repairs. Strong performance on Terminal-Bench and (Multi-)SWE-Bench–style tasks demonstrates practical effectiveness in terminals, IDEs, and CI across languages.

Agent Performance. MiniMax-M2 plans and executes complex, long-horizon toolchains across shell, browser, retrieval, and code runners. In BrowseComp-style evaluations, it consistently locates hard-to-surface sources, maintains evidence traceable, and gracefully recovers from flaky steps.

Efficient Design. With 10 billion activated parameters (230 billion in total), MiniMax-M2 delivers lower latency, lower cost, and higher throughput for interactive agents and batched sampling—perfectly aligned with the shift toward highly deployable models that still shine on coding and agentic tasks.

[-]

idkwhattochoo@reddit

"Its composite score ranks #1 among open-source models globally" are we that blind?

it failed on majority of simple debugging cases for my project and I don't find it as good as it's benchmark score somehow through? GLM 4.5 air or heck even qwen coder REAP performed much better for my debugging use case

[-]

Simple_Split5074@reddit

I found it to be rather good a bug fixing python in roo code, likely better than full GLM 4.6

[-]

Apart-River475@reddit

I found it really bad in my task

[-]

this_is_a_long_nickn@reddit

Care to share more details? E.g., language, project size, task type, etc. you known the drill :-)

[-]

idkwhattochoo@reddit

Rust and Golang; I use crush cli

[-]

Finanzamt_kommt@reddit

Might be wrong implementation by provider?

[-]

Such_Advantage_6949@reddit

Or the model could simply be benchmaxing

[-]

Finanzamt_kommt@reddit

Might be but all benchmarks at once?

[-]

joninco@reddit

[-]

Educational_Sun_8813@reddit

just checked yesterday REAP for glm-4.5-air and it works pretty well

[-]

OccasionNo6699@reddit

Hi, I'm engineer from MiniMax. May I know which endpoint did you use. There's some problem with openrouter's endpoint for M2, we still working with them.
We recommend you to use M2 in Anthropic Endpoint, with tool like Claude Code. You can grab an API Key from our offical API endpoint and use M2 for free.
https://platform.minimax.io/docs/guides/text-ai-coding-tools

[-]

SilentLennie@reddit

Looking at how it's working, you folks seem to have made a pretty complete system. The model and the chat system at https://agent.minimax.io/

The model is testing the script I asked for to see what mistakes it made and automatically fixes it.

I think the model might be worse than some, but as part of the complete solution it is working.

[-]

nullmove@reddit

Will there be a technical report?

[-]

Worthstream@reddit

What do you mean for free? What are the limits?

[-]

idkwhattochoo@reddit

Thank you for the response, indeed I was using openrouter endpoint; I'll use official API endpoint then

[-]

pmttyji@reddit

Hi, Any small/medium models(MOE would be awesome) coming? Thanks

[-]

Baldur-Norddahl@reddit

Maybe you were having this problem?

"IMPORTANT: MiniMax-M2 is an interleaved thinking model. Therefore, when using it, it is important to retain the thinking content from the assistant's turns within the historical messages. In the model's output content, we use the ... format to wrap the assistant's thinking content. When using the model, you must ensure that the historical content is passed back in its original format. Do not remove the ... part, otherwise, the model's performance will be negatively affected"

[-]

Arli_AI@reddit

Wow that sounds like it'll use a lot of the context window real quick.

[-]

nullmove@reddit

Depends on if it thinks a lot. But the bigger problem I think is that most coding agents are built to strip those (at least the one at the very beginning because interleaved thinking isn't very common).

[-]

Arli_AI@reddit

That's easily solved with a few lines of code changes really the issue would be the inflation of context size.

[-]

idkwhattochoo@reddit

I used openrouter instead of running it locally; I assume it's better on their official API endpoint

[-]

Mike_mi@reddit

Tried it on open router wasn't even able to do proper tool calling, from their api works like a charm with CC

[-]

Baldur-Norddahl@reddit

The quoted problem is something your coding agent would have to handle. It is not the usual way, so it is very likely doing it wrong.

[-]

Recent-Success-1520@reddit

I tried it with Openrouter today and it fixed issues what GLM couldn't in even 6 tries.

[-]

power97992@reddit

From my testing, the output of minimax m2 with thinking looks a lot worse than Claude 4.5 sonnet no thinking and deepseek 3.2 no thinking , worse than free gpt 5 thinking low. It is slightly worse than gemini flash with 10k token thinking and qwen 3 vl 32 b with no thinking . It is better than glm 4.5 air thinking as the code actually displays something. It is about on par with glm 4.6 thinking on this one task… It is better than qwen 3 next 80b thinking

[-]

_yustaguy_@reddit

Btw, this test was based on only one task

oh, so it tells us pretty much nothing

[-]

power97992@reddit

Yeah, but im not gonna spend hours testing it against 7-8 models on various tasks. Max maybe 2 or 3 tasks against 3-4 models.. testing one task against various models already took like an hour

[-]

lumos675@reddit

For my usecase(writing a comfyui custom node) sonnet 4.5 last night could not solve issue after i finished my budget of like 20 prompt. But minimax solved it on first try so it depends to the task i think. Sometimes a model can solve an issue sometimes it dont. And in those times you better to get a second opinion. Until now i am happy with minimax m2

[-]

celsowm@reddit

I hope they release Minimax Text 02 too, t1 was the best open one in my Brazilian Legal Benchmark

[-]

Rascazzione@reddit

it is a bit strange, the model says it is BF16 and when I have looked at what it occupies, it is the equivalent of FP8. I have set it to download as it fits me in 4 RTX 6000 pro.

Has anyone else noticed this?

[-]

WonderRico@reddit

it is fp8. it was actually trained in fp8 : https://huggingface.co/MiniMaxAI/MiniMax-M2/discussions/14#68ff9a39550682ab5ea04a98

[-]

No_Conversation9561@reddit

Guys whoever will be working on this on llama.cpp, please put your tip jar in your github profile

[-]

ilintar@reddit

This looks like a very typical model, its only quirk is that it's pre-quantized in FP8. Fortunately, compilade just dropped this in llama.cpp:

https://github.com/ggml-org/llama.cpp/pull/14810

[-]

ilintar@reddit

In fact, I think in case of this model the bigger (harder) part to implement will be its chat template, i.e. the "interleaved thinking" part.

[-]

nullmove@reddit

It seems M2 abandoned the fancy linear lightning attention, and opted for a traditional arch. Usually that's a big hurdle and indeed the reason earlier Minimax models weren't supported.

[-]