Who is your favourite quant publisher and why?

[-]

QuickExpert@reddit

In my case, Bartowski was roughly \~15% faster than Unsloth. The quality differences are indistinguishable.

[-]

dampflokfreund@reddit

Gemma 4 26b and 35B MoE also take less VRAM than their Unsloth counterpart. IDK what's up with that.

[-]

VoiceApprehensive893@reddit

depends on the quant recipe i guess the unsloth iq3_xxs is smaller

[-]

marscarsrars@reddit

Doesnt the precision and accuracy loss below q4 make these quants unusable?

[-]

VoiceApprehensive893@reddit

sometimes a q3 is the smartest model you can fit

q3's are small enough to justify the damage

[-]

relmny@reddit

Unsloth (for your same reasons), Bartoswki and Ubergarm (for the biggest models with ik_llama.cpp)

[-]

Kahvana@reddit

I've been liking Unsloth models less, and Bartowski models more over the past months.

I like that bartowski's imatrix data is (mostly) public, and there is a speed difference between the quants on my weaker hardware. Bartowski also still provides Q1 quants without removing them after release.

[-]

Total_Activity_7550@reddit

bartowski never rushes release to become first finetune and reap download count. And unsloth always does that, and so often they realise "fixes" later (not telling that they let garbage out in the first place).

[-]

danielhanchen@reddit

We again want to clear up multiple misunderstandings around our GGUF updates. Some people have said we re-upload often because of our own mistakes - this is false - the majority of issues are the labs themselves or the implementation backing them.

We just like to publicize about them, so it "seems" like it's our problem but it's NOT.

1) Mistral Medium 3.5 Fix

We worked with Mistral to fix a config.json RoPE parsing bug in Mistral Medium 3.5 - EVERYONE had this issue, so everyone had to re-convert - but we were the ones to collab with Mistral to fix it. See https://www.reddit.com/r/LocalLLaMA/comments/1t1itn1/unsloth_solved_bug_in_mistral_medium_35/

2) MiniMax 2.7 NaNs

We found NaNs in 38% of Bartowski’s (10/26 quants) and 22% of ours (5/23 quants). Bartowski STILL hasn't uploaded fixed quants - so 38% are broken still - we already did ages ago.

We identified a fix and already patched ours - see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax_m27_gguf_investigation_fixes_benchmarks/ Bartowski has not patched yet, but is actively working on it.

10/26 NaNs (38%) found at https://huggingface.co/bartowski/MiniMaxAI_MiniMax-M2.7-GGUF: Chunk-32 failures (9): IQ3_XXS, IQ3_XS, IQ3_M, Q3_K_M, Q3_K_L, Q3_K_XL, Q4_K_S, Q4_1, Q5_K_S. Late failure (1): IQ1_S (crashed at chunk 311)
5/23 NaNs (21%) ours had NaNs - all fixed now at https://huggingface.co/unsloth/MiniMax-M2.7-GGUF: UD-Q4_K_S, UD-Q4_K_M, UD-Q4_K_XL, UD-Q5_K_S, MXFP4_MOE. All block 32.
AesSedai's Q4_K_M at https://huggingface.co/AesSedai/MiniMax-M2.7-GGUF was re-provided with our Q6_K trick.

3) Gemma 4 was re-uploaded 5 times

Three were due to about 10 to 20 llama.cpp bug fixes, some of which we helped investigate and contribute a fix as well. The fourth was an official Gemma chat template improvement from Google. Every provider had to update, not just us. See llama.cpp PRs which shows \~30 PR fixes / improvements for Gemma-4

See https://www.reddit.com/r/LocalLLaMA/comments/1sqrl1l/gemma_4_26ba4b_gguf_benchmarks/ for new benchmarks

Mean KL Divergence puts nearly all Unsloth GGUFs on the Pareto frontier
KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy.
This makes Unsloth the top-performing in 21 of 22 sizes. Similar trend for 99.9% KLD and others.

4) Qwen3.5 SSM issues

We shared 7TB of research artifacts showing which layers should not be quantized. The issue was not that providers’ quants were broken, but that they were not optimal - mainly around `ssm_out` and `ssm_*` tensors. We have since improved ours and now lead on KLD vs. disk space for Qwen3.5 as well.

Most if not all quant providers then take our findings then update their quants. We talked about our analysis and research at https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwen3535ba3b_unsloth_dynamic_ggufs_benchmarks/ and https://www.reddit.com/r/LocalLLaMA/comments/1rlkptk/final_qwen35_unsloth_gguf_update/

5) CUDA 13.2 is actually broken

This causes some low bit quants on all models to get gibberish. Some people have dismissed it as not being an issue, but NVIDIA has confirmed it's a problem and a fix is coming in CUDA 13.3. See Unsloth Issue 4849, llama.cpp issue 21255, issue 21371

As a temporary solution use CUDA 13.1. See https://github.com/ggml-org/llama.cpp/issues/21255#issuecomment-4248403175 quote from https://github.com/johnnynunez

The bug was found and fixed in cuda 13.3

[-]

Total_Activity_7550@reddit

This doesn't address the fact that you always rush without testing, after that people download tens of gigabytes of data, then having to redownload everything. I stopped doing that mistake. I am not against your work, it is great, I guess your compute resources are also great, but the strategy isn't nice. You trade being in "Trending" list on HuggingFace for reliability.

[-]

RobotRobotWhatDoUSee@reddit

Huge fan, keep it up!

[-]

Digger412@reddit

Did you happen to spot the MiMo-V2.5 (non-Pro) layer 47 `ffn_down_exps` issue at Q4/Q5? I had to quantize it to Q6_K otherwise I was getting NaN on those quants for my Q4_K_M.

[-]

BraceletGrolf@reddit

I for one love your work, so please keep it going.

[-]

danielhanchen@reddit

Thanks :)

[-]

Kahvana@reddit

I mean, it's unavoidable with how complex llms and llama.cpp are that there will be issues on day 1. It's nice for those that there is an option for those who want to run models day 1.

With the recent release cycle of Qwen3.5, Gemma4 and Mistral Medium 3.5, I've realized that stability is more appealing to me, so I now rather wait a week or two before trying the model.

[-]

voyager256@reddit

That’s quite unfair - it’s mostly not their fault as these are various teething issues - unrelated to Unsloth or any other provider You could argue they could put big warning/ disclaimer on the model page for the first couple weeks and track history of fixes . Ok but the do already a lot and for free so …

[-]

danielhanchen@reddit

There is a weird misunderstanding going on that these issues are only Unsloth related or somehow we caused them - it's because we publicize them that it looks like it's "our" fault - see https://www.reddit.com/r/LocalLLaMA/comments/1tc588v/comment/olnpx8h/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

[-]

danielhanchen@reddit

We again want to clear up multiple misunderstandings around our GGUF updates. Some people have said we re-upload often because of our own mistakes - this is false - the majority of issues are the labs themselves or the implementation backing them.

We just like to publicize about them, so it "seems" like it's our problem but it's NOT.

1) Mistral Medium 3.5 Fix

We worked with Mistral to fix a config.json RoPE parsing bug in Mistral Medium 3.5 - EVERYONE had this issue, so everyone had to re-convert - but we were the ones to collab with Mistral to fix it. See https://www.reddit.com/r/LocalLLaMA/comments/1t1itn1/unsloth_solved_bug_in_mistral_medium_35/

2) MiniMax 2.7 NaNs

We found NaNs in 38% of Bartowski’s (10/26 quants) and 22% of ours (5/23 quants). Bartowski STILL hasn't uploaded fixed quants - so 38% are broken still - we already did ages ago.

We identified a fix and already patched ours - see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax_m27_gguf_investigation_fixes_benchmarks/ Bartowski has not patched yet, but is actively working on it.

10/26 NaNs (38%) found at https://huggingface.co/bartowski/MiniMaxAI_MiniMax-M2.7-GGUF: Chunk-32 failures (9): IQ3_XXS, IQ3_XS, IQ3_M, Q3_K_M, Q3_K_L, Q3_K_XL, Q4_K_S, Q4_1, Q5_K_S. Late failure (1): IQ1_S (crashed at chunk 311)
5/23 NaNs (21%) ours had NaNs - all fixed now at https://huggingface.co/unsloth/MiniMax-M2.7-GGUF: UD-Q4_K_S, UD-Q4_K_M, UD-Q4_K_XL, UD-Q5_K_S, MXFP4_MOE. All block 32.
AesSedai's Q4_K_M at https://huggingface.co/AesSedai/MiniMax-M2.7-GGUF was re-provided with our Q6_K trick.

3) Gemma 4 was re-uploaded 5 times

Three were due to about 10 to 20 llama.cpp bug fixes, some of which we helped investigate and contribute a fix as well. The fourth was an official Gemma chat template improvement from Google. Every provider had to update, not just us. See llama.cpp PRs which shows \~30 PR fixes / improvements for Gemma-4

See https://www.reddit.com/r/LocalLLaMA/comments/1sqrl1l/gemma_4_26ba4b_gguf_benchmarks/ for new benchmarks

Mean KL Divergence puts nearly all Unsloth GGUFs on the Pareto frontier
KLD shows how well a quantized model matches the original BF16 output distribution, indicating retained accuracy.
This makes Unsloth the top-performing in 21 of 22 sizes. Similar trend for 99.9% KLD and others.

4) Qwen3.5 SSM issues

We shared 7TB of research artifacts showing which layers should not be quantized. The issue was not that providers’ quants were broken, but that they were not optimal - mainly around `ssm_out` and `ssm_*` tensors. We have since improved ours and now lead on KLD vs. disk space for Qwen3.5 as well.

Most if not all quant providers then take our findings then update their quants. We talked about our analysis and research at https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwen3535ba3b_unsloth_dynamic_ggufs_benchmarks/ and https://www.reddit.com/r/LocalLLaMA/comments/1rlkptk/final_qwen35_unsloth_gguf_update/

5) CUDA 13.2 is actually broken

This causes some low bit quants on all models to get gibberish. Some people have dismissed it as not being an issue, but NVIDIA has confirmed it's a problem and a fix is coming in CUDA 13.3. See Unsloth Issue 4849, llama.cpp issue 21255, issue 21371

As a temporary solution use CUDA 13.1. See https://github.com/ggml-org/llama.cpp/issues/21255#issuecomment-4248403175 quote from https://github.com/johnnynunez:

[-]

No_Algae1753@reddit (OP)

Yes this approach has pro and cons. I still rather have not 100% working models than none and often times it's also a llama.cpp or the model publisher themselves (e.g. mistral)

[-]

danielhanchen@reddit

We don't publish Q1 quants for dense models because they're not useful and loop after a few turns - KLD and PPL are parabolic at these bits, so we decided to not provide them depending on the model - we specifically do not want folks to use useless quants - we still do it for the large ones like https://huggingface.co/unsloth/MiMo-V2.5-Pro-GGUF
We publish our imatrix final weights - for eg https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF/blob/main/imatrix_unsloth.gguf_file - so you can plug this into any quantization process.
We're the best in KLD and PPL for Qwen 3.6, Gemma 4, and many other quants - see https://www.reddit.com/r/LocalLLaMA/comments/1sqrl1l/gemma_4_26ba4b_gguf_benchmarks/ and https://www.reddit.com/r/LocalLLaMA/comments/1so5nrl/qwen36_gguf_benchmarks/
We recently helped Mistral fix a bug in Mistral Medium 3.5 - see https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/discussions/18
MiniMax 2.7 - We found NaNs in 38% of Bartowski’s (10/26 quants) Bartowski STILL hasn't uploaded fixed quants - so 38% are broken still - we already fixed ours ages ago - see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax_m27_gguf_investigation_fixes_benchmarks/
There is a weird misunderstanding going on that these issues are only Unsloth related or somehow we caused them - it's because we publicize them that it looks like it's "our" fault - see https://www.reddit.com/r/LocalLLaMA/comments/1tc588v/comment/olnpx8h/ for more details

[-]

Kahvana@reddit

Not entirely the case, remember what happened with Qwen3.5-397B-A17B? Happy to hear that policy changed though!
Fair, that is indeed true! I should've specified that I specifically meant imatrix dataset, not the final weights. A misgiving on my part
Great! But how is that relevant to my comment?
Great! But how is that relevant to my comment?
Great! But how is that relevant to my comment?
Great! How is that relevant to my comment?

Your comment is oddly defensive. I simply stated why I prefer Bartowski the past few months. Did I say something mean to you? If I did, I apologize sincerely.

Don't get me wrong, I'm happy to receive the work you've provided for free and understand it's a luxury position to be in after having made quants myself.

[-]

Public_Umpire_1099@reddit

Bartowski is awesome for keeping the imatrixes out in the open. Not only that, but he keeps his datasets for making your own imatrixes out in the open. I recently needed to quantize a fine-tune of Qwen 3.6 35B and make it MTP compatible. Between his datasets, and some of the community ones out there that support not destroying the MTP heads, I was able to produce a Q8, and an IQ4 & Q4KM with relative ease that works fantastically and is extremely high accuracy. Legitimately the best quantized MTP Qwen 3.6 35B I've used so far, not even due to bias.

[-]

Constant-Simple-1234@reddit

ByteShape for speed and quality, but they take time and publish only few selected. Then Unsloth for size, quality, docs and transparency (and all the work).

[-]

TheGlobinKing@reddit

AesSedai and Bartowski

[-]

ttkciar@reddit

Bartowski. He doesn't quant all of the models I'd like, but he quants the must-haves / best of them, and his quants are fairly reliable.

If he hasn't quantized a must-have LLM yet, it's probably because he's waiting for llama.cpp to iron out some support issues. Sometimes he has to requant because of upstream bugs, but usually not.

I especially like that Bartowski frequently (though not always) publishes a bf16 GGUF, which I like to download against the eventuality of llama.cpp regaining its native training feature. Also, once upon a time you used to be able to convert bf16 GGUFs back to safetensors, but the script for doing that broke about a year ago due to GGUF format changes, and I don't know if it's feasible to try to fix it.

If a model is too niche for Bartowski to quantize it, I go to Mradermacher.

[-]

meganoob1337@reddit

cyankiwi makes very good awq quants , and is pretty fast for new models!

if you read it , thank you !

[-]

rainbyte@reddit

Here I'm using Qwen3.6 AWQ quants by Cyankiwi, thank you so much Cyankiwi 🤗

[-]

No_War_8891@reddit

QuantTrio for vLLM stuff

[-]

meganoob1337@reddit

cyankiwi is also also pretty nice

[-]

rainbyte@reddit

Yeah, I like both :)

[-]

No_War_8891@reddit

true 😎

[-]

Hipponomics@reddit

Liking the practices of one quant publisher above another is fine, but you should be aware of the context they work in.

All these quants are built on the old quantization technology that Iwan Kawrakow made a few years ago. He forked llama.cpp and has been making new quants since then. ubergarm and a few others publish the newer IQn_K quants which are considerably better than any quants that work with mainline llama.cpp, this includes both Unsoth's and Mudler's quants.

The fact that Iwan left llama.cpp to make his own fork is one of the biggest losses to happen to local LLM inference. We would all be using his latest IQ_KS and IQ_KT quants if it weren't for that. If someone figures out a way to resolve the conflict between Georgi and him, they would be doing the community an enormous service!

[-]

RobotRobotWhatDoUSee@reddit

Unsloth and bartowski.

Wrt to Unsloth, it was amazing seeing papers about dynamic quants, thinking "someday we will get high quality low-bit quants," and then Unsloth operationalized high-quality low bit quants much earlier than I expected.

The first time I ran llama 4 scout on a laptop and it almost passed my personal code test with an unsloth UD2 quant, it felt like a magical glimpse into the future. Incredible.

[-]

Ok_Mine189@reddit

The Bloke.

[-]

roosterfareye@reddit

What happened to the Bloke? I see the name against many aging ggufs...

[-]

Equivalent-Repair488@reddit

He was the pioneer, the mradermacher+bartowski+unsloth before they even existed, the one and only quant provider, when he departed his role, many like the above attempted to fill his void.

He stopped around feb 2024 I believe, because he has his own company to run now, but back then he was the only mass quant provider for GGUFs, and other formats

[-]

o0genesis0o@reddit

I download models from The Bloke, Bartowski, and Unsloth.

But sometimes I also download the official quant from llamacpp team as well as ground truth.

[-]

mantafloppy@reddit

Anyone but Unsloth.

I don't like having to redownload my gguf 6 time.

[-]

Yorn2@reddit

lukealonso has made one of the best MiniMax M2.7 quants for my current use case. mratsim made one before that for M2.5. Aes Sedai as well. Basically any of the guys making models in the RTX6kPRO Discord are genius-level model creators.

[-]

VoiceApprehensive893@reddit

been using unsloth models though i dont really see a difference between them and smth like bartowski or mradermacher quants

[-]

grumd@reddit

Unsloth did some KLD benchmarks and APEX models were worse than other quants.

[-]

Makers7886@reddit

Lately I look for AesSedai on day 0-ish (maybe biased as I'm a fan of WOT books) then ubergarm a bit later when looking for most capability per gb situations/not vllm.

[-]

Digger412@reddit

AesSedai here - yep, I'm a fan of the books too ;)

Thanks for checking out my quants!

[-]

json12@reddit

Your quants are awesome. One suggestion I’d like to make is if you can also do lower quants (Q1s and Q2s etc.)

[-]

Digger412@reddit

Usually I point people to Ubergarm at that range since his ik_llama quants provide better quality at that bpw. But I can look at adding an IQ2-range to my lineup for people who want to stick on mainline llama.cpp, sure!

[-]

itssethc@reddit

Bartowski first, Unsloth if needed.

[-]

xaocon@reddit

It's not that hard to make your own with your own dataset, you don't need a ton of data to do a good job. If you really don't want to do that you may need to try a few to figure out the best. Unsloth and bartowski are good options to start with if you want a gguf.

[-]

Yes_but_I_think@reddit

Both are great people, do you have ulterior motive in choosing and comparing them?

[-]

Kat-@reddit

Just AesSedai. I take a bf16 abilerated version of a model I want from p-e-w or llmfan46 and quantize to GGUF using the scheme AesSedai creates for the same model.

[-]

Past-Economist7732@reddit

Ubergarm and Aesedai are my goats

[-]

Digger412@reddit

<3

[-]

a_beautiful_rhind@reddit

ubergarm, aesedai, bartowski and mradermacher

Probably misspelled some of the names.

[-]

Mordimer86@reddit

DavidAU has some decent works. Some pretty weird builds, but also way better than Unsloth quant of Qwen3.6 27B. Compared to Unsloth IQ4_XS from this side does not get stuck in loop with Opencode right at the end of a task.

I've also checked perplexity and his quants seem to be better than Unsloth too:

[-]

astope909@reddit

All of them except unsloth. unsloth are sleazy and shitty and constantly go around making advertisement posts about nothing. They are clearly trying to come up with some brand that they can sell when they have nothing. Their moe quants are ass cause they do shit like quant the shared experts which are used on every token. Always use Aes or Ubergarm over them. their regular quants arent special. Bartowskis are just as good. They claim they have "special sauce dynamic" quants. But theres only one thing they do that Aes, Bartowski, and Ubergarm dont and thats Unsloth does not tell you their quanting recipes. Even IK shit their claims about "dynamic" quants in an ik_llama help thread asking for unsloth support. they constantly feel the need to tell everyone about the "fixes" they push. Which are 99% of the time tiny chat template updates that anyone could have done. Aes has PR'd lcpp for model support like kimi 2.5's vision encoder and he just added mimo 2.5 support with vision. What the hell has unsloth done other than make some quants early and slap names on them to make them look fancy? And they use this subreddit so I half expect them to find this comment and shove a chart in my face thats probably misleading like their others. or will show flat out that Aes's quants are better

[-]

RAZA_2666R@reddit

Honestly, hard to beat Unsloth for speed and stability. Their documentation alone saves so much headache. I’ve noticed similar things with GSM8k results on other quants too; Unsloth just tends to hold up better on logic tasks.

[-]

No-Juggernaut-9832@reddit

Black sheep AI for Apple MLX

[-]

Mr_Moonsilver@reddit

Cyankiwi for his AWQ models an Quant Trio

[-]

Bulky-Priority6824@reddit

in like apex but for 3.6 it passes think tags regardless of any attempt to prevent it to frigate genai so thats a deal breaker. for 3.6 ive only tried unsloth and its been superb out of the box. 3.6 35b a3b q4 k xl

[-]

0-0x0@reddit

This guy's video made me use Qwen3.6 Q4_K_M from lmstudio, although I could run the Q8_K_XL from unsloth reliably, for this model I'm just playing around with the Q4_K_M since it's obviously faster and doesn't give up much on quality tbh. https://www.youtube.com/watch?v=ONQcX9s6_co

[-]

Hot_Turnip_3309@reddit

sometimes unsloth hits it out of the park but I swear they don't test very well, for example their 3.6 35B isn't very good. The bartowski one is good. But I am using APEX for this model.

[-]

OutrageousMinimum191@reddit

Generally I prefer to download and store original transformers models, quantizing them myself, except very big models, in this case I prefer Unsloth to download low quant ggufs.

[-]

nickm_27@reddit

From what I understand the realistic situation is that there is going to be very little if any difference in actual output behavior / quality between publishers at the same quant level. Some prioritize speed while trying to maintain as much quality as possible and others prioritize quality at a given model size.

For me personally I have been using Unsloth as they provide the recommended llama.cpp parameters which usually work well for me, and I have not had a good enough reason to try another publisher because it seems like at the end of the day it will at best be very similar.

[-]

JLeonsarmiento@reddit

This guy here:

https://huggingface.co/leonsarmiento

Solid quants for Apple.