Magistral 1.2 is incredible. Wife prefers it over Gemini 2.5 Pro.

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 66 comments

TL:DR - AMAZING general use model. Y'all gotta try it.

Just wanna let y'all know that Magistral is worth trying. Currently running the UD Q3KXL quant from Unsloth on Ollama with Openwebui.

The model is incredible. It doesn't overthink and waste tokens unnecessarily in the reasoning chain.

The responses are focused, concise and to the point. No fluff, just tells you what you need to know.

The censorship is VERY minimal. My wife has been asking it medical-adjacent questions and it always gives you a solid answer. I am an ICU nurse by trade and am studying for advanced practice and can vouch for the advice magistral is giving is legit.

Before this, wife has been using Gemini 2.5 pro and hates the censorship and the way it talks to you like a child (let's break this down, etc).

The general knowledge in Magistral is already really good. Seems to know obscure stuff quite well.

Now, once you hook it up to a web search tool call is where this model I feel like can hit as hard as proprietary LLMs. The model really does wake up even more when hooked up to the web.

Model even supports image input. I have not tried that specifically but I loved image processing from Mistral 3.2 2506 so I expect no issues there.

Currently using with Openwebui with the recommended parameters. If you do use it with OWUI, be sure to set up the reasoning tokens in the model settings so thinking is kept separate from the model response.

[-]

NinjaK3ys@reddit

I’m impressed by your technical skills despite being a trained ICU nurse. 🙌

[-]

stonediggity@reddit

Very nice

[-]

IWearSkin@reddit

My wife's boyfriend swears by Gemini 2.5 though

[-]

perelmanych@reddit

Me: bashing GPT-5 for not being able to refactor 2k line js file into several files on the first try.
People: Are happy with Qwen3-4B 👀

[-]

Steus_au@reddit

translates better than qwen, yes

[-]

evia89@reddit

For medical stuff I prefer OG DS 3.1. It never refuses

[-]

SashaUsesReddit@reddit

What's her use case that it shines so much?

[-]

My_Unbiased_Opinion@reddit (OP)

Mostly question asking. Medical and schoolwork. (Psychology, nutrition, biology, critical care nursing management).

[-]

jazir555@reddit

Ok now I'm confused, are you using Gemini 2.5 Pro through the Gemini app or AI Studio? The Gemini app version is effectively lobotomized and extremely censored, the AI Studio version is orders of magnitude better in my experience and I have run into ~5 refusals tops in daily use since march and I ask an exorbitant amount of questions which would get turned down on other platforms (and they have!). Gemini 2.5 Pro is the most permissive frontier model out of any of them as far as answering questions without refusals, so I can only assume you are using the consumer facing version.

[-]

218-69@reddit

The Gemini app version is effectively lobotomized and extremely censored

This is not true btw, at least in the past few months. gemini.google.com can accept and discuss total nsfw content in images for example, whereas in ai studio there's a filter on top that prevents the reply to the image from coming through, and it can trigger on text as well, even in code blocks now.

[-]

My_Unbiased_Opinion@reddit (OP)

You are correct, she is using the Gemini app. Makes sense when it comes to the censorship. I can use the API, but you are limited by the free responses you can get per day. The app she is using is free for a year because of the pixel purchase.

[-]

IrisColt@reddit

Perhaps adding "Less prose. No yapping."

[-]

IrisColt@reddit

Perhaps adding "Less prose. No yapping."

[-]

jazir555@reddit

https://aistudio.google.com/

You can make it use the 3 dot menu to create an app icon for it. It's effectively an installed app once you do, it opens straight to the chat UI like any other AI app I have installed from the appstore/playstore.

[-]

My_Unbiased_Opinion@reddit (OP)

Thank you. This is going to be useful for sure.

[-]

SashaUsesReddit@reddit

Those seem pretty narrow as a use case. What have you found the accuracy to be?

Thats a hot button of data to rely on

[-]

My_Unbiased_Opinion@reddit (OP)

I have found the accuracy to be very good. I trust it now but verify critical stuff. It hasn't let me down yet.

[-]

SashaUsesReddit@reddit

Great to know! Thank you

[-]

djstraylight@reddit

Magistral is my favorite base model for custom applications these days. I usually have a graph that decides if it needs to reach out with some tools or an api call to gpt-5, claude 4 or gemini 2.5 pro for hard facts/reasoning and then hand that result to the Magistral model to present to the user.

The abliterated version of Magistral is quite spicy. Mistral models are the least censored I've found, and this takes it to a new level.

[-]

LegacyRemaster@reddit

I tested the latest release of Magistral Small 2509 in three case scenarios using LM Studio.

1) Extract text from an educational YouTube video using an MCP server

2) Extract the text and create a summary

3) Once the summary is complete, create a "clean" document from the informative text extracted from the video.

I compared it with:

-Qwen 4b no thinking

-GPT 20b thinking medium

-Magistral Small 2509 thinking

-Qwen 30b instruct

The best extraction with an MCP server was performed by Qwen 4b. Magistral Small looped even at low temperatures. GPT 20 was slower in processing the prompt, but everything was fine. Qwen 30b was slower than 4b, but the result was the same.

In the summary, Qwen 30b won out, both in formatting and in terms of ease of reading and rewriting the video in a clean and presentable format (removing chatter, etc.). Unfortunately, Magistral was the worst, producing answers that were too concise, even when reworking the prompt. For this specific assignment, Qwen 4b + 30b is the best solution for both speed and final result. Using the MCP tools (YouTube search, video text extraction, Google search) was perfect. I'm keeping it only for its image reading capabilities (OCR). I haven't tested other LLMs because I try to beat the current best (for me) in the real world use case.

[-]

JLeonsarmiento@reddit

Can confirm: Wife favorite tech stack:

Qwen3-4b- instruct: she uses it for everything. She likes that it feels like reads her mind on what she really wants when prompting “she”.
Qwen3-8b-No_Think: Same, but lost in use time due to speed: 4b feels like the same but fast. However, 8b is called when things get serious, when knowledge depth it’s important (she’s in academia), and it has the same “vibe” of 4b that she’s already used to.
GPT-OSS 20b: the coding pal. Almost used exclusively for coding, math, that kind of stuff. It does a great work with logic explanations. It’s more objetive, non-personal tone triumphs for this. I think that also being super fast helps in tasks where you have to go through lots of trial and error kind of work.

[-]

terminoid_@reddit

what do you mean "even at low temperatures" ? you didn't use the sampling parameters recommended by the model authors?

[-]

uptonking@reddit

I have also come to the same conclusion, magistral response is too concise and short, and it forces me to ask follow-up questions.
another problem is content is boring compared to qwen3-32b/gemma3-27b, for lack of tables and external links

[-]

toothpastespiders@reddit

I'll always trust spouse benchmarking a million times more than any formal benchmarks.

[-]

tmvr@reddit

My wife, Morgan Fairchild, also confirms it is a great model!

[-]

-dysangel-@reddit

My wife, Palmella Handerson, says it's pretty good

[-]

ikkiyikki@reddit

I knew she was cheating on me 😤

[-]

Sea_Mouse655@reddit

Can we please make this a formal benchmark?

[-]

My_Unbiased_Opinion@reddit (OP)

Me too. me too. If the wife says it's bad, it's bad, no question lol.

[-]

Optimalutopic@reddit

Forget about LLM as a judge, here we use wife as judge

[-]

GoodGuyLafarge@reddit

You understood how a relationshop works, smart man!

[-]

donotdrugs@reddit

So it's actually your biased opinion?

[-]

Silver_Jaguar_24@reddit

The boss, you mean?

[-]

JLeonsarmiento@reddit

😂 introducing WifeBench verified. This a great idea.

[-]

Optimalutopic@reddit

I see you are using LLMs to connect to web, might want to look at what I have built, it can connect the Llm to web, YouTube, Reddit, maps, git, local files etc, can create podcast on your research, completely locally and with proprietary models as well. https://github.com/SPThole/CoexistAI

[-]

reneil1337@reddit

totally agree its an increeedible model. the best open source vision model that I used so far - better than Qwen 2.5 Vision 72b

[-]

AI-On-A-Dime@reddit

That magistral small would match Gemini pro on general purpose sounds unbelievable, meaning I don’t believe it. It would be interesting to see how it compares to similar size models like qwen3 30b3 and oss 20b though

[-]

redditisunproductive@reddit

Depends. If she is using the web or especially phone app for Gemini, yes I'd believe that. Every benchmark I have run shows the web version massively underperforms the API. The Pro 2.5 label is a flat out lie.

[-]

My_Unbiased_Opinion@reddit (OP)

Yep you are correct. She is using the Gemini app. API is a ton better.

[-]

No_Information9314@reddit

Do you find this to be true of all the commercial models? API outperforms web version?

[-]

Timotheeee1@reddit

usually yes, because the web versions tend to include enormous system prompts while the API has none

[-]

Kathane37@reddit

It often is. For exemple : chatgpt.com nerf gpt-5 thinking context window (160k/400k) and max thinking power (64/200) if you are a plus user.

[-]

redditisunproductive@reddit

No, mainly Gemini. There is some variation of course due to system prompts etc but the Gemini gap is on a completely different level from anyone else.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

redditisunproductive@reddit

Why is this not on openrouter yet? I guess I am lazy but I hate the hassle of rebuilding llama.cpp just to try out a new model before I know I will like it. I've been waiting for more reviews but this one sounds temping for an all-rounder, perfect size. Been looking for a dense reasoner other than Qwen. Seed OSS 36b is getting large but pretty good too.

[-]

coumineol@reddit

Sir this is a r/LocalLLaMA

[-]

AppearanceHeavy6724@reddit

Openrouter is in limbo though (if you are using small models off of it). We can count as local too.

[-]

simracerman@reddit

Nice to hear! I use Mistral 3.2 2506 regularly for its reliability.

How much better is this one? Trying to convince myself to download and benchmark it.

[-]

My_Unbiased_Opinion@reddit (OP)

I was previously using the 2506 regularly as well.

The best way I can describe it is it's basically 2506 on steroids. It seems to have more general knowledge as well.

I do think it's not a simple "2506 but with reasoning". It feels like the model was trained further.

[-]

simracerman@reddit

The benchmarks show it improved upon the last Magistral by 15% which is big! I’ll give it a shot.

[-]

My_Unbiased_Opinion@reddit (OP)

Yeah. I do feel Mistral models perform better IRL than benchmarks. So a 15% improvement on benchmarks might translate to something bigger in real use cases.

[-]

FluffyGoatNerder@reddit

Mice. What is the full ollama pull url? I'm having trouble finding that's exact model on the library

[-]

My_Unbiased_Opinion@reddit (OP)

ollama run hf.co/unsloth/Magistral-Small-2509-GGUF:Q3_K_XL

Or:

ollama run hf.co/unsloth/Magistral-Small-2509-GGUF:Q4_K_XL

Top one is for Q3KXL which is the one I use personally. But the bottom one technically is more precise.

[-]

OWUI, be sure to set up the reasoning tokens in the model settings so thinking is kept separate from the model response.

Could you be more specific about which parameter needs to be set up? Thank you.

[-]

No-Equivalent-2440@reddit

I think he means reasoning tags. In the model settings, in advanced parameters you can set both opening and closing tags for reasoning.

[-]

Swimming_Drink_6890@reddit

What do you run it on?

[-]

My_Unbiased_Opinion@reddit (OP)

It's got a permanent residence in my 3090.

[-]

Swimming_Drink_6890@reddit

Dope, I'll test it out.