I've been putting MiniMax M1 through its paces and *man,* I went into it really biased and hoping to have found a new LLM that would go the distance. Sadly, I found it to be extremely easily confused. It makes typos, flips out into huge strings of Chinese characters. I would call it sycophantic, but that would be an insult to sycophants. When they talk about stochastic parrots, this is what they're talking about - it is very unsophisticated mirror. Still, I have a strange feeling they might nail this at some point down the road. But for my uses (which includes a lot of complicated theory of mind research, UX design, and creative writing), it's not a contender. I will admit though, the base persona of the model is sort of refreshingly 'nice' and unpretentious. But man, it is miles away from DeepSeek.
Minimax and StepFun are the most slept on models. I really wish more providers offered them, especially because they're permissively licensed. Minimax is such a big jump from Llama 4 and Deepseek-v3.
Bullshit. Orginal MiniMax is a weak model, weaker than oroginal V3 let alone V3-0324. Both benchmarks https://huggingface.co/MiniMaxAI/MiniMax-Text-01https://huggingface.co/MiniMaxAI/MiniMax-Text-01) and vibe check confirm that. The only selling point of MiniMax-Text-01 was large context window no one realy tested the performance on the long context though.
The funny thing is they are honest about this and simply show the benchmarks where they are not maxing. That fact alone makes me curious if their other claims are not also true. Most other better models develop huge problems with larger context and are mostly better in the <8k range after that they drop down fast.
what is the reason minimax is not so popular? I guess it is because of no GGUF support. I wish the companies who release these models also released GGUFs with llama.cpp support similar to what QWEN team did for qwen3 models.
For Local use, it's because there's no GGUFs and most local users use llama.cpp or ollama. Minimax is a hybrid model and Stepfun's models are audio-text to text, and llama.cpp doesn't support that.
As for commercial usage, it's because minimax has 44B activated parameters, which means serving it is slower than Llama 4 Maverick and Deepseek V3.
MiniMax M1 is a 456B A46B MoE model that's a bit behind in benchmarks compared to the larger DeepSeek R1.0528 (671B) that has less active params (37B). It's often better or tied with the original R1, except for SimpleQA where it's significantly behind.
The interesting thing is that it scores way better in the long context benchmark OpenAI-MRCR, delivering better results than GPT4.1 at 128k and similar at 1M context. This benchmark is just a "Needle in Haystack" variant though - a low score means the model is bad at long context, while a high score doesn't necessarily mean it's good at making something out of the information in the long context. In the more realistic LongBench-v2 it makes the 3rd place, right after the Gemini models, which also scored quite well in fiction.liveBench.
So, a nice local model for long context handling. Yet it eats way to much VRAM at short context for most user systems already.
Better long-context scaling for attention is a nice thing, yet mostly useless when the model accuracy breaks down in longer contexts. There aren't many models on the leaderboard that maintain a decent long-context accuracy. That's the important part. Paying less for long context is a bonus.
Their base model is pretty old. I believe the consensus when it was released was that it primarily pretrained on STEM and then distilled from GPT4-Turbo for instruction.
bullerwins@reddit
The mini version never got support for llama.cpp, maybe this one gets more interest:
https://github.com/ggml-org/llama.cpp/issues/11290
-dysangel-@reddit
I hope so. I've been waiting a while for linear attention
Background_Put_4978@reddit
I've been putting MiniMax M1 through its paces and *man,* I went into it really biased and hoping to have found a new LLM that would go the distance. Sadly, I found it to be extremely easily confused. It makes typos, flips out into huge strings of Chinese characters. I would call it sycophantic, but that would be an insult to sycophants. When they talk about stochastic parrots, this is what they're talking about - it is very unsophisticated mirror. Still, I have a strange feeling they might nail this at some point down the road. But for my uses (which includes a lot of complicated theory of mind research, UX design, and creative writing), it's not a contender. I will admit though, the base persona of the model is sort of refreshingly 'nice' and unpretentious. But man, it is miles away from DeepSeek.
Dark_Fire_12@reddit (OP)
Nice comment, that will make people who like Base models happy.
TheRealistDude@reddit
Sorry for a noobish question, but will it be possible to install minimax audio TTS locally?
celsowm@reddit
Is there a way to disable reasoning like qwen3?
Few_Painter_5588@reddit
Minimax and StepFun are the most slept on models. I really wish more providers offered them, especially because they're permissively licensed. Minimax is such a big jump from Llama 4 and Deepseek-v3.
AppearanceHeavy6724@reddit
Really? You sure? Go test it. Both old reasoning MiniMax-01 and non-reasoning Minimax are weaker than V3-0324 and R1.
Few_Painter_5588@reddit
Yes, Yes, and yes I did.
AppearanceHeavy6724@reddit
Bullshit. Orginal MiniMax is a weak model, weaker than oroginal V3 let alone V3-0324. Both benchmarks https://huggingface.co/MiniMaxAI/MiniMax-Text-01https://huggingface.co/MiniMaxAI/MiniMax-Text-01) and vibe check confirm that. The only selling point of MiniMax-Text-01 was large context window no one realy tested the performance on the long context though.
Former-Ad-5757@reddit
The funny thing is they are honest about this and simply show the benchmarks where they are not maxing. That fact alone makes me curious if their other claims are not also true. Most other better models develop huge problems with larger context and are mostly better in the <8k range after that they drop down fast.
Affectionate-Cap-600@reddit
well... one of the difference is that those models are trained on 8-16k and then extended.
minimax was pretrained natively with 1M context (probably because the hybrid attention make it much faster to train on long text)
MLDataScientist@reddit
what is the reason minimax is not so popular? I guess it is because of no GGUF support. I wish the companies who release these models also released GGUFs with llama.cpp support similar to what QWEN team did for qwen3 models.
Few_Painter_5588@reddit
For Local use, it's because there's no GGUFs and most local users use llama.cpp or ollama. Minimax is a hybrid model and Stepfun's models are audio-text to text, and llama.cpp doesn't support that.
As for commercial usage, it's because minimax has 44B activated parameters, which means serving it is slower than Llama 4 Maverick and Deepseek V3.
AppearanceHeavy6724@reddit
Because it has performance massively worse than Deepseek yet heavier on resources, having each MoE expert 20% bigger.
Dark_Fire_12@reddit (OP)
Both are so slept on.
I just asked Chutes, Nebiusaistudio, Novita_labs hopefully they can manage.
Chromix_@reddit
MiniMax M1 is a 456B A46B MoE model that's a bit behind in benchmarks compared to the larger DeepSeek R1.0528 (671B) that has less active params (37B). It's often better or tied with the original R1, except for SimpleQA where it's significantly behind.
The interesting thing is that it scores way better in the long context benchmark OpenAI-MRCR, delivering better results than GPT4.1 at 128k and similar at 1M context. This benchmark is just a "Needle in Haystack" variant though - a low score means the model is bad at long context, while a high score doesn't necessarily mean it's good at making something out of the information in the long context. In the more realistic LongBench-v2 it makes the 3rd place, right after the Gemini models, which also scored quite well in fiction.liveBench.
So, a nice local model for long context handling. Yet it eats way to much VRAM at short context for most user systems already.
AppearanceHeavy6724@reddit
The most interesting thing about the model is linear attention or so they claim.
Chromix_@reddit
Better long-context scaling for attention is a nice thing, yet mostly useless when the model accuracy breaks down in longer contexts. There aren't many models on the leaderboard that maintain a decent long-context accuracy. That's the important part. Paying less for long context is a bonus.
AppearanceHeavy6724@reddit
No one sadly tested the model yet on long fiction benchmark or what was that called. I have a hunch it is going to perform well.
fictionlive@reddit
Is there an API yet?
AppearanceHeavy6724@reddit
check minimax.io
a_beautiful_rhind@reddit
"continue with google"
No other options.
AppearanceHeavy6724@reddit
Not chat.minimax.io. but their main site. They have a link to API.
a_beautiful_rhind@reddit
Same story there.
Su_mang@reddit
try this link, it's their platform: https://www.minimax.io/platform_overview
a_beautiful_rhind@reddit
Sadly if I go to sign up for their API it also only has continue with google. Maybe openrouter works, haven't looked yet.
Dear_Custard_2177@reddit
One of the coolest things; their free ai agent! It works pretty good for a model that's somewhat behind the new deepseek.
Neither-Phone-7264@reddit
!Remindme 3 days
RemindMeBot@reddit
I will be messaging you in 3 days on 2025-06-19 15:59:12 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
Sudden-Lingonberry-8@reddit
wow, it is a really impreesive model.. but it needs more intelligence
AppearanceHeavy6724@reddit
Checked for creative writing and it was bad. Complete ass.
TheRealMasonMac@reddit
Their base model is pretty old. I believe the consensus when it was released was that it primarily pretrained on STEM and then distilled from GPT4-Turbo for instruction.
Dark_Fire_12@reddit (OP)
So fast!
nullmove@reddit
I think it's up in the web UI: https://chat.minimax.io/
AppearanceHeavy6724@reddit
Hugginface space does not require login.
AppearanceHeavy6724@reddit
The have free space on hugging face.
Wooden-Potential2226@reddit
RULER results anywhere?
BreakfastFriendly728@reddit
BreakfastFriendly728@reddit
linear attention comes to the stage!
Dark_Fire_12@reddit (OP)
Tech Report: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M1_tech_report.pdf
nullmove@reddit
This looks pretty great. Especially for function calling (Tau-bench) and long context this seems like SOTA for open-weights.
However thinking budget of 40k/80k sounds scary as fuck even if it's faster because of hybrid-attention.
Dark_Fire_12@reddit (OP)
I tried uploading the table but skill issue. Can someone else please try