Magistral 1.2 is incredible. Wife prefers it over Gemini 2.5 Pro.
Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 66 comments
TL:DR - AMAZING general use model. Y'all gotta try it.
Just wanna let y'all know that Magistral is worth trying. Currently running the UD Q3KXL quant from Unsloth on Ollama with Openwebui.
The model is incredible. It doesn't overthink and waste tokens unnecessarily in the reasoning chain.
The responses are focused, concise and to the point. No fluff, just tells you what you need to know.
The censorship is VERY minimal. My wife has been asking it medical-adjacent questions and it always gives you a solid answer. I am an ICU nurse by trade and am studying for advanced practice and can vouch for the advice magistral is giving is legit.
Before this, wife has been using Gemini 2.5 pro and hates the censorship and the way it talks to you like a child (let's break this down, etc).
The general knowledge in Magistral is already really good. Seems to know obscure stuff quite well.
Now, once you hook it up to a web search tool call is where this model I feel like can hit as hard as proprietary LLMs. The model really does wake up even more when hooked up to the web.
Model even supports image input. I have not tried that specifically but I loved image processing from Mistral 3.2 2506 so I expect no issues there.
Currently using with Openwebui with the recommended parameters. If you do use it with OWUI, be sure to set up the reasoning tokens in the model settings so thinking is kept separate from the model response.
NinjaK3ys@reddit
I’m impressed by your technical skills despite being a trained ICU nurse. 🙌
stonediggity@reddit
Very nice
IWearSkin@reddit
My wife's boyfriend swears by Gemini 2.5 though
perelmanych@reddit
Me: bashing GPT-5 for not being able to refactor 2k line js file into several files on the first try.
People: Are happy with Qwen3-4B 👀
Steus_au@reddit
translates better than qwen, yes
evia89@reddit
For medical stuff I prefer OG DS 3.1. It never refuses
SashaUsesReddit@reddit
What's her use case that it shines so much?
My_Unbiased_Opinion@reddit (OP)
Mostly question asking. Medical and schoolwork. (Psychology, nutrition, biology, critical care nursing management).
jazir555@reddit
Ok now I'm confused, are you using Gemini 2.5 Pro through the Gemini app or AI Studio? The Gemini app version is effectively lobotomized and extremely censored, the AI Studio version is orders of magnitude better in my experience and I have run into ~5 refusals tops in daily use since march and I ask an exorbitant amount of questions which would get turned down on other platforms (and they have!). Gemini 2.5 Pro is the most permissive frontier model out of any of them as far as answering questions without refusals, so I can only assume you are using the consumer facing version.
218-69@reddit
This is not true btw, at least in the past few months. gemini.google.com can accept and discuss total nsfw content in images for example, whereas in ai studio there's a filter on top that prevents the reply to the image from coming through, and it can trigger on text as well, even in code blocks now.
My_Unbiased_Opinion@reddit (OP)
You are correct, she is using the Gemini app. Makes sense when it comes to the censorship. I can use the API, but you are limited by the free responses you can get per day. The app she is using is free for a year because of the pixel purchase.
IrisColt@reddit
Perhaps adding "Less prose. No yapping."
IrisColt@reddit
Perhaps adding "Less prose. No yapping."
jazir555@reddit
https://aistudio.google.com/
You can make it use the 3 dot menu to create an app icon for it. It's effectively an installed app once you do, it opens straight to the chat UI like any other AI app I have installed from the appstore/playstore.
My_Unbiased_Opinion@reddit (OP)
Thank you. This is going to be useful for sure.
SashaUsesReddit@reddit
Those seem pretty narrow as a use case. What have you found the accuracy to be?
Thats a hot button of data to rely on
My_Unbiased_Opinion@reddit (OP)
I have found the accuracy to be very good. I trust it now but verify critical stuff. It hasn't let me down yet.
SashaUsesReddit@reddit
Great to know! Thank you
djstraylight@reddit
Magistral is my favorite base model for custom applications these days. I usually have a graph that decides if it needs to reach out with some tools or an api call to gpt-5, claude 4 or gemini 2.5 pro for hard facts/reasoning and then hand that result to the Magistral model to present to the user.
The abliterated version of Magistral is quite spicy. Mistral models are the least censored I've found, and this takes it to a new level.
LegacyRemaster@reddit
I tested the latest release of Magistral Small 2509 in three case scenarios using LM Studio.
1) Extract text from an educational YouTube video using an MCP server
2) Extract the text and create a summary
3) Once the summary is complete, create a "clean" document from the informative text extracted from the video.
I compared it with:
-Qwen 4b no thinking
-GPT 20b thinking medium
-Magistral Small 2509 thinking
-Qwen 30b instruct
The best extraction with an MCP server was performed by Qwen 4b. Magistral Small looped even at low temperatures. GPT 20 was slower in processing the prompt, but everything was fine. Qwen 30b was slower than 4b, but the result was the same.
In the summary, Qwen 30b won out, both in formatting and in terms of ease of reading and rewriting the video in a clean and presentable format (removing chatter, etc.). Unfortunately, Magistral was the worst, producing answers that were too concise, even when reworking the prompt. For this specific assignment, Qwen 4b + 30b is the best solution for both speed and final result. Using the MCP tools (YouTube search, video text extraction, Google search) was perfect. I'm keeping it only for its image reading capabilities (OCR). I haven't tested other LLMs because I try to beat the current best (for me) in the real world use case.
JLeonsarmiento@reddit
Can confirm: Wife favorite tech stack:
Qwen3-4b- instruct: she uses it for everything. She likes that it feels like reads her mind on what she really wants when prompting “she”.
Qwen3-8b-No_Think: Same, but lost in use time due to speed: 4b feels like the same but fast. However, 8b is called when things get serious, when knowledge depth it’s important (she’s in academia), and it has the same “vibe” of 4b that she’s already used to.
GPT-OSS 20b: the coding pal. Almost used exclusively for coding, math, that kind of stuff. It does a great work with logic explanations. It’s more objetive, non-personal tone triumphs for this. I think that also being super fast helps in tasks where you have to go through lots of trial and error kind of work.
terminoid_@reddit
what do you mean "even at low temperatures" ? you didn't use the sampling parameters recommended by the model authors?
uptonking@reddit
toothpastespiders@reddit
I'll always trust spouse benchmarking a million times more than any formal benchmarks.
tmvr@reddit
My wife, Morgan Fairchild, also confirms it is a great model!
-dysangel-@reddit
My wife, Palmella Handerson, says it's pretty good
ikkiyikki@reddit
I knew she was cheating on me 😤
Sea_Mouse655@reddit
Can we please make this a formal benchmark?
My_Unbiased_Opinion@reddit (OP)
Me too. me too. If the wife says it's bad, it's bad, no question lol.
Optimalutopic@reddit
Forget about LLM as a judge, here we use wife as judge
GoodGuyLafarge@reddit
You understood how a relationshop works, smart man!
donotdrugs@reddit
So it's actually your biased opinion?
Silver_Jaguar_24@reddit
The boss, you mean?
JLeonsarmiento@reddit
😂 introducing WifeBench verified. This a great idea.
Optimalutopic@reddit
I see you are using LLMs to connect to web, might want to look at what I have built, it can connect the Llm to web, YouTube, Reddit, maps, git, local files etc, can create podcast on your research, completely locally and with proprietary models as well. https://github.com/SPThole/CoexistAI
reneil1337@reddit
totally agree its an increeedible model. the best open source vision model that I used so far - better than Qwen 2.5 Vision 72b
AI-On-A-Dime@reddit
That magistral small would match Gemini pro on general purpose sounds unbelievable, meaning I don’t believe it. It would be interesting to see how it compares to similar size models like qwen3 30b3 and oss 20b though
redditisunproductive@reddit
Depends. If she is using the web or especially phone app for Gemini, yes I'd believe that. Every benchmark I have run shows the web version massively underperforms the API. The Pro 2.5 label is a flat out lie.
My_Unbiased_Opinion@reddit (OP)
Yep you are correct. She is using the Gemini app. API is a ton better.
No_Information9314@reddit
Do you find this to be true of all the commercial models? API outperforms web version?
Timotheeee1@reddit
usually yes, because the web versions tend to include enormous system prompts while the API has none
Kathane37@reddit
It often is. For exemple : chatgpt.com nerf gpt-5 thinking context window (160k/400k) and max thinking power (64/200) if you are a plus user.
redditisunproductive@reddit
No, mainly Gemini. There is some variation of course due to system prompts etc but the Gemini gap is on a completely different level from anyone else.
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
redditisunproductive@reddit
Why is this not on openrouter yet? I guess I am lazy but I hate the hassle of rebuilding llama.cpp just to try out a new model before I know I will like it. I've been waiting for more reviews but this one sounds temping for an all-rounder, perfect size. Been looking for a dense reasoner other than Qwen. Seed OSS 36b is getting large but pretty good too.
coumineol@reddit
Sir this is a r/LocalLLaMA
AppearanceHeavy6724@reddit
Openrouter is in limbo though (if you are using small models off of it). We can count as local too.
simracerman@reddit
Nice to hear! I use Mistral 3.2 2506 regularly for its reliability.
How much better is this one? Trying to convince myself to download and benchmark it.
My_Unbiased_Opinion@reddit (OP)
I was previously using the 2506 regularly as well.
The best way I can describe it is it's basically 2506 on steroids. It seems to have more general knowledge as well.
I do think it's not a simple "2506 but with reasoning". It feels like the model was trained further.
simracerman@reddit
The benchmarks show it improved upon the last Magistral by 15% which is big! I’ll give it a shot.
My_Unbiased_Opinion@reddit (OP)
Yeah. I do feel Mistral models perform better IRL than benchmarks. So a 15% improvement on benchmarks might translate to something bigger in real use cases.
FluffyGoatNerder@reddit
Mice. What is the full ollama pull url? I'm having trouble finding that's exact model on the library
My_Unbiased_Opinion@reddit (OP)
ollama run hf.co/unsloth/Magistral-Small-2509-GGUF:Q3_K_XL
Or:
ollama run hf.co/unsloth/Magistral-Small-2509-GGUF:Q4_K_XL
Top one is for Q3KXL which is the one I use personally. But the bottom one technically is more precise.
FluffyGoatNerder@reddit
Excellent. Thanks very much!
No-Equivalent-2440@reddit
I’m trying to run magistral, the official quant, in ollama and using webui. When I run it in ollama, reasoning works. Once I run it in owui, there is no reasoning, just immediate answer. Why could this be happening?
Professional-Bear857@reddit
Did you add the custom reasoning tags in the model settings section? The tags are [THINK] and [/THINK].
No-Equivalent-2440@reddit
oh yes. the problem is no thinking, rather than thinking output mixed with final answer…
Professional-Bear857@reddit
I'm using the model with openweb ui and I get thinking, however I'm running it through lm studio to openweb ui, so maybe thats why.
No-Equivalent-2440@reddit
Maybe it’s my user prompt. I have things like be concise, to the point, be direct… It might be the model sees this and does not think. But no other model, even older Magistral has problems. I’ll try to remove my prompt and let to know.
Professional-Bear857@reddit
Possibly, I'm using the unsloth quant which has the default system prompt, you might want to try the default system prompt.
EnvironmentalToe3130@reddit
With which tools you connect it to web search?
secondr2020@reddit
Could you be more specific about which parameter needs to be set up? Thank you.
No-Equivalent-2440@reddit
I think he means reasoning tags. In the model settings, in advanced parameters you can set both opening and closing tags for reasoning.
Swimming_Drink_6890@reddit
What do you run it on?
My_Unbiased_Opinion@reddit (OP)
It's got a permanent residence in my 3090.
Swimming_Drink_6890@reddit
Dope, I'll test it out.