Unsloth accused a brand new team (ByteShape) of "literally cheating." I brought the receipts, and Unsloth moved the goalposts.
Posted by TheRealSol4ra@reddit | LocalLLaMA | View on Reddit | 122 comments
TL;DR: I shared a highly efficient new model from a brand new team (ByteShape) in the Unsloth Discord. Instead of welcoming the competition or having a technical discussion, the Unsloth devs immediately accused them of "literally cheating" and manipulating data. When I reached out to ByteShape directly, they provided a highly professional, detailed technical breakdown, and even updated their graphs within 24 hours to appease Unsloth's complaints. Unsloth's response? They dismissed the technical facts, moved the goalposts to complain about the new graph, and shut down the conversation when they ran out of arguments.
Hey everyone. I’m making this post because I think the open source AI community thrives on innovation, collaboration, and healthy competition. Unsloth is undoubtedly one of the largest and most respected teams in this space right now, which is exactly why their recent conduct towards a brand new team needs to be called out.
As end users, we shouldn't have to deal with massive teams trying to bully newcomers out of the scene just because their numbers look threatening. Here is a chronological breakdown of exactly what happened over the last few days.
Part 1: The Spark
On April 10th, I posted in the Unsloth Discord about a newly released model from a team called ByteShape (Qwen3.5-35B-A3B). As an end-user, I was getting around 160 t/s, which was fantastic, and the size to performance ratio was incredibly impressive. I didn't post it as an attack, I was just sharing an interesting new development in the quantization space.
[My initial post introducing ByteShape's new model, sharing performance graphs, and noting the impressive generation speeds and size to performance ratio.]()
Part 2: The Immediate Hostility and Accusations
Instead of taking it in stride, Unsloth developers Mike and Daniel immediately got defensive. Instead of discussing the tech, they went straight to accusations:
- "Literally Cheating": Daniel repeatedly claimed ByteShape was "literally cheating" by supposedly using Quantization-Aware Training (QAT) directly on the benchmark datasets to inflate their numbers. He had zero proof of this.
- Dismissing Metrics: Mike stated that calculating tokens per second (t/s) is a "very silly metric overall", a bizarre claim considering how vital inference speed is to local LLM users.
- The Graph "Controversy": Their main sticking point was ByteShape’s graphs. ByteShape used a 1, 2, 3, etc... numbering system to rank their models from smallest to largest. Unsloth argued that putting a "1." on a ByteShape model and a "1." on an Unsloth model was deliberately manipulating data to force a 1 bit to 3 bit comparison, even though the x axis, legend, and hover text clearly differentiated them.
When I asked if they had actually contacted ByteShape about these massive accusations of data manipulation, Daniel scoffed at the idea, saying, "But why? They're a company," and pointed to a random Reddit comment he made as sufficient communication.
[Mike from Unsloth reacts defensively, accusing ByteShape of misrepresenting data with their graph labels and dismissing tokens per second (t\/s) as a \"silly metric.\"]()
[Daniel jumps to conclusions, claiming the analysis is skewed and outright accusing ByteShape of \"literally cheating.\"]()
[Daniel scoffs at the idea of formally contacting ByteShape, stating \"But why? They're a company\" and citing a random Reddit comment as sufficient communication.]()
[Daniel doubles down on the baseless accusations, claiming ByteShape is cheating by using Quantization-Aware Training (QAT) on benchmark data.]()
Part 3: Getting Real Answers
Since Unsloth refused to talk to the team they were accusing of fraud, I did. I sent an email to ByteShape detailing Unsloth’s specific complaints (the graph labels, the QAT cheating accusations, and dequantization overhead).
ByteShape responded immediately with incredible professionalism. They explained:
- No Cheating (PTQ, not QAT): They confirmed their approach is strictly Post Training Quantization. The dataset used for datatype allocation is fully isolated from benchmark datasets to prevent contamination.
- The Labels: They clarified the numbers were purely ordinal (ranking size within a specific family) and were never intended to be cross-method correspondences.
- Speed Explanations: They explained that because most LLM inference workloads are memory-bound, cutting data volume in half provides a massive speed benefit that heavily outweighs dequantization overhead, which is exactly why reporting end to end t/s matters.
They ended the email by thanking me for the questions, expressing excitement that people were enjoying the model, and asking for more feedback.
[ByteShape's highly professional email response clarifying their PTQ method, dataset isolation, and the reasoning behind their ordinal graph labels and speed metrics.]()
Part 4: Moving the Goalposts
I brought this detailed, highly technical response back to the Unsloth Discord.
Mike completely ignored the technical clarifications about PTQ and memory bottlenecks. He hyper fixated entirely on the visual graph argument, stating that relying on hover-text is misleading for screenshots and concluding that "until they change their misleading graphs there's not much to say honestly."
Here is where it gets ridiculous. ByteShape actually listened. Within 24 hours, they updated their graphs, changing the numeric labels to letters (A, B, C) to completely eliminate any possibility of a "1 vs 1" misunderstanding.
When I showed Unsloth the updated graph, proving that ByteShape had actively accommodated their demands, Mike immediately complained that his demands were being framed as "trivial," and stated:
[Mike dismisses ByteShape's detailed technical explanations, hyper-fixating solely on the graph's visual labeling and arguing it is still misleading.]()
[I reveal that ByteShape actually listened and updated their graph labels to letters (A, B, C) within 24 hours to accommodate Unsloth's specific complaints.]()
[Moving the goalposts: Mike ignores ByteShape's visual fix, takes offense at his demands being called \"trivial,\" and invents a new rule that the exact quant names must be labeled.]()
The Takeaway
The moment ByteShape fixed the graph, Unsloth invented a brand new rule to stay mad. It became painfully obvious that this was never about "misleading data" or protecting the consumer. It was about finding an excuse to invalidate a competitor.
When I finally laid out this entire timeline to them, pointing out how professionally ByteShape handled the situation compared to their own defensiveness, Mike's final response was to repeat the phrase "manipulating data" and end the conversation with:
If your models and methods are truly superior, you don't need to gatekeep visualization choices or accuse newcomers of cheating without proof. Unsloth has done incredible work for this community, but this behavior is entirely unacceptable for a team of their size and influence.
[The final exchange where Mike ignores the summary of events, repeats the \"manipulating data\" buzzword, and abruptly shuts down the conversation with \"Anyways moving on.\"]()
We should be encouraging new teams like ByteShape who are bringing real innovation and speed to the table, not defending massive teams who throw a tantrum the second someone else posts a good benchmark.
Disclaimer: Please do not use this post as an excuse to harass the Unsloth team or send hate their way in their Discord or elsewhere. The goal of this post is simply to hold leaders in the community accountable, correct the record on a new developer's tech, and encourage better professional standards. Let's keep the discussion focused on the technology and fostering a welcoming open source environment.
Mundane_Ad8936@reddit
So OP goes onto a teams discord and then wants to debate another teams work. This is a real ahole move.. at what does OP realize gf good people don't go picking fights like this.
TLDR OP decided to promote another teams work in Unsloths discord. Ignored the feedback of people more knowledgeable and then they decided to people man int he middle for a fight between the two.. brilliant move buddy, it went as followed.
Meanwhile the Unsloth team has been a massively kind and generous to this community..
Take this down you're the one whose wrong..
TheRealSol4ra@reddit (OP)
I think you are severely misunderstanding my relationship with that server. I didn’t just walk in, drop an advertisement, and leave to stir up trouble, I have over a thousand messages in that Discord. I am an active, participating member of that community who simply shared an exciting new model with fellow enthusiasts. Sharing and comparing new tech happens literally every day in AI spaces; it isn't 'taking a sh*t on the floor,' it's how open source discussion works.
Furthermore, I didn't 'ignore' their feedback, I did the exact opposite. When Unsloth raised technical concerns and threw out cheating accusations, I went directly to the developers to get the actual technical answers, because the Unsloth devs refused to do it themselves.
I have massive respect for the incredible work Unsloth has done for this space, and I've used their tools myself. But past contributions do not give a massive team a free pass to baselessly accuse a brand new competitor of 'literally cheating' without a shred of proof. Holding leaders accountable to basic professional standards isn't being a 'leech'; it's how we keep the community healthy and ensure new innovators aren't bullied out of the scene.
Mundane_Ad8936@reddit
To busy defending yourself that you're missing the point entirely.. regardless of if your process was valid or not you disrespected them in their house.
You picked a fight that you could have easily walked away from. There was absolutely no valid reason to take it on. You did not have to insert yourself between them and a rival, you chose to do that. Then you did it on their discord right in front of their community..
Be a normal person for a minute.. would you have done that to someone at their home with all their friends around them?
I don't know if you're on the spectrum or not but this is one of those situations where you absolutely failed to read the room. You're the one at fault here.
Next time stay out of someone else's fight.. don't insert yourself as the go between messenger. This is the kind of thing teenagers do not adults.
relmny@reddit
I might read this later for "fun", because this sounds strange:
"When I reached out to ByteShape directly, they provided a highly professional, detailed technical breakdown, and even updated their graphs within 24 hours to appease Unsloth's complaints. Unsloth's response? They dismissed the technical facts, moved the goalposts to complain about the new graph, and shut down the conversation when they ran out of arguments."
So you complain that, instead of talking about the "old" graph, they talk about the new one?
That's what I expect anyone to do, talk about the new stuffs! or else people will be say:
"see? they are still talking about the old stuff!"
That alone stooped from keep reading...
TheRealSol4ra@reddit (OP)
I totally hear you, and it’s a fair point to raise! But the reason I used the term 'moving the goalposts' is that Unsloth set a very specific condition for the graph to be 'fixed': they argued that the 1, 2, and 3 labels were misleading because they implied a bit-width comparison.
ByteShape met that exact condition by changing the labels to A, B, and C within 24 hours. Instead of acknowledging the fix, Unsloth immediately pivoted to a brand new requirement, that the bubbles must list the full quant names, something they hadn't mentioned as a dealbreaker before.
It wasn't just about 'talking about new stuff'; it was about the fact that no matter what ByteShape did to accommodate them, Unsloth simply invented a new reason to dismiss the data rather than addressing the actual technical performance.
relmny@reddit
Thanks for the effort, but I really don't care, I'm here for "fun".
I might be wrong, because I didn't really followed the drama, but what I take from this is that some people are using Unsloth's name to get publicity (are there any mentions to Bartowski, Ubergarm, AesSedai, etc?).
This only hurts and nothing good comes out of this.
Whether you like it or not, Unsloth does a great job. Even if you don't use their GGUF, they get involved with the community and they provide great documentation for running models. Something that I haven't seen anyone else do.
I mainly use Bartowski and Unsloth, and some times Ubergarm, but when there's a new model, I always look at Unsloth's documentation, because the info I need is almost always there.
Your post is not useful at all, actually is opposite to be useful, it hurts.
If another team makes great quants, we will know at some point, like we got to know about Bartowski, Unsloth, Ubergarm and all the others.
But using probably the team with most downloads, to gain publicity by attacking them, that's low.
And I'm done with this thread.
I hope the Unsloth guys don't take any of this seriously and shrug it off ASAP.
yoracale@reddit
Thanks for the support we appreciate it, definitely not taking this to heart and the user was known to be schizo in other Discords as well like lmstudio's where they were banned. We unfortunately did the wrong move of not banning them earlier
finevelyn@reddit
I'm most confused why you took this to Unsloth in the first place - they are not a guardian of the open LLM community and you (or ByteShape) don't need their approval. You are yourself framing them as competitors to ByteShape.
danielhanchen@reddit
In the end, we actually did benchmarks and confirmed their models (red) do worse on the Pareto frontier on KLD 99.9% vs ours (green):
PiaRedDragon@reddit
I think you guys might be optimizing for the wrong things.
All I care about is how it actually does on benchmarks, because that is what I am going to be using it for, how intelligent it is.
relmny@reddit
Is a "/s" missing or you truly believe that what matters are benchmarks and that they actually measure how "intelligent" they are?
PiaRedDragon@reddit
Benchmarks measure intelligence a hell of a lot better than the Pareto frontier or KLD.
I know that, because the Unsloth models are complete shite, and that is what they are optimizing for.
This was a 500 Question MMLU-Pro test run I did against Unsloth models vs a SMALLER model and they got crushed on every single run. They were up to 30% WORSE performance than the RAM model.
relmny@reddit
"own" benchmarks could measure intelligence, most public ones most likely measure how good they were trained for those benchmarks.
Benchmarks are so manipulation-prone that mean nothing.
About your graphic, without knowing settings and so, it doesn't mean anything to me.
Anyway, you hate Unsloth, I like them and use some of their quants because I think they are good... let's leave it as that.
yoracale@reddit
Yep they definitely hate us because I called them out for shilling a few days ago. Thanks for the support, you cant believe everything u see out there unfortunately! :)
PiaRedDragon@reddit
"I hate Unsloth"
Like WTF dude? Where did I say that, I am just presenting data, yes it shows that Unsloth quants are dog shit, but that doesn't mean I hate them, it just means there are much better models out there.
Holiday_Purpose_3166@reddit
I didn't want to bark further into OP's original silliness, but help me understand the context here if we're still muddling.
Whilst I appreciate massively the job you've done in the OSS community - I recall this being touched in the past by someone from Unsloth that these charts aren't a great indicator of model performance, but here we are.
Based on my own usecase benchmarks, Byteshape's best IQ4_XS equiv performs better than your UD-Q5_K_XL in *my* agentic coding usecases.
I would assume fidelity would strike the difference in the results but hasn't been the case here, and the score deviation is just slightly outside of noise. The difference was there, and it becomes an appealing choice when memory consumption is a lot smaller for the effort.
My point being, I understand social media is a tricky place, but it strikes contradicting to prove the one thing that is always a debate due to fluctuating differences.
I hope responses like that don't come out of hubris, because bashing a small tuner when you have a higher influence in this space can backfire.
Humbly, my two cents.
ProfessionalSpend589@reddit
You’re too polite. I’m not confused. It’s to manufacture a controversy.
terminoid_@reddit
here's new graph by Daniel, enjoy. also, for the record, sol4ra on discord manages to be pretty obnoxious in both the LocalLlama server and Unsloth's server.
https://ibb.co/LX624Zm1
Xamanthas@reddit
In LM studio too.
sam_lain@reddit
Damn...
ttkciar@reddit
Violates Rule Five: Follow Reddit's Content Policy
DarePitiful5750@reddit
OP seems to work for ByteShape
TheRealSol4ra@reddit (OP)
I don’t seem to understand your logic here. Are you saying that baseless accusations are okay because proof might exist?
DarePitiful5750@reddit
I'm saying not all proof is made immediately public in all cases. Nothing to do with "might" exist.
denoflore_ai_guy@reddit
Need to technically verify but tone and actions and lack of technical pushback is lazy gate keeping from an incumbent leader.
Tokens per second is a silly metric but with gpu poor people it’s a very important metric in these crazy times to them.
It’s a tech team pissing match. On Reddit. With brigading and fanboy mass downvoting. That actually makes it the first normal thing to happen today.
Friendly_Beginning24@reddit
Holy shit, LLM drama
PiaRedDragon@reddit
They have history, I showed this bench marking I was working on IN A DIFFERENT SUB, that was a direct comparison of the Unsloth model to a SMALLER version of the model.
They were getting cooked, then I woke up to a perma-ban on r/unsloth I didn't even post in their sub ffs.
Lol, big babies.
PiaRedDragon@reddit
I found this comment particularly funny. Yeah right.
OP, move on, just be happy you have access to better models, let them stick with their stuff, and you get to use better models. The community will move to the best models eventually anyway, or will be left behind.
tmvr@reddit
Don't take that too seriously, that's standard corpo speak. It's like "How are you?" or "We need to get together soon!" etc.
853350@reddit
the unsloth brothers have long made me suspicious. i’ve always been surprised that folks will train a model and then let two essentially random dudes quantize it (potentially totally changing your end user experience)
yoracale@reddit
We did not want to address this, but here we are. You were a member of our Discord who repeatedly harassed community members and tried to stir up drama all the time. Many other users were aware of your behavior as well, but we let it slide because we were lenient with rule enforcement. Then, out of nowhere, you tried to bring up ByteShape trying to stir drama and accusations.
The graphs they presented were misleading. Labeling the quants as “1.” vs. “1.” suggests to the viewer that the comparison is apples to apples, but that is not what was actually shown. In reality, they compared their 3-bit quant to a 1-bit quant and labeled both as “1.” Naturally, the 1-bit quant performed much worse than the 3-bit quant. However, anyone reading the graph would reasonably assume they were comparing quants of the same size or bit-width. The standard practice in the community is to label the quant size clearly, but they chose not to do that. As a result, the graph is misleading and makes our quants appear worse than they actually are.
No-Refrigerator-1672@reddit
Sure, the formatting of the graphs could be better, that's valid. But, I feel like you're focusing on the wrong point: they don't claim to be better at the same size, they claim to have faster TG at the same benchmark score, that's a different metric. Can you elaborate more your claims about their benchmarks having incorrect setups (something about QAT)?
yoracale@reddit
We're doing benchmarks right now, will address it.
No-Refrigerator-1672@reddit
I've commented this graph here. I would be grateful if you can look it over.
danielhanchen@reddit
Here are the plots:
No-Refrigerator-1672@reddit
Okay, so let's leiterate what I'm saying. ByteShape is claiming that their quants run faster than yours at equal benchmark score. I assume it translates to equal KLD. This table was published on your Discord, I assume it contains accurate data. Let's pick comparison pair, based on your previous graph, Unsloth IQ4_NL and ByteShape Q4_XS_4.12bpw are at equal KLD; another good pair is Unsloth IQ3_XSS and ByteShape Q3_K_S_2.89bpw, but IQ3_XSS is absent from the table. Other quants are too different in KLD to compare raw numbers.
So, their TG seems equal, but their PP is indeed faster, at equal KLD. I think they might have the point. This claim requires a graph with KLD on Y, PP/TG on X to make definite conclusion, though.
RogerRamjet999@reddit
It's hard to tell about the graph, since it's so small. However, the claim from Mike that tokens per second is a "very silly metric" just sounds bizarre. I don't know all the background, but that claim just sounds like pulling a complaint out of your backside.
RogerRamjet999@reddit
Wow, attacked in numbers, seems like this struck a nerve in someone. So, since my observation is so terrible as to deserve to be down-voted into oblivion, please explain exactly how "tokens per second is a very silly metric". Either come back with something good, or everyone will know you're just sock-puppet down-voting anyone who questions you ever so slightly.
GCoderDCoder@reddit
I didnt down vote you but I will say you responded to a message from local llm royalty (unsloth) explaining that someone pushing PR for "competitors" for lack of a better term are comparing speeds of non comparable quants.
The real issue is there are tons of quants of there. Some people prefer one source or another for different reasons. Pushing Unsloth to be like another group seems like the worst way to go.
I actually think it was nice they went back and forth with you at all. I work in open source software and I dont speak to competitor products I just point out what we do and if I see a difference I'd point it out like they did but I'm not going to deep dive on competitors because 1) there's normally no time to learn every difference and 2) most things have pros and cons so you need to decide the value to you.
I dont use discord for reasons but I imagine they planned on discord being a path to help people use their product not research competitors.
Just some thoughts...
RogerRamjet999@reddit
Thanks for the reasonable reply. Some here seem to be incorrectly assuming that I have some relationship with the OP, I don't. I just read over the post and honestly couldn't tell what the chart was saying, but the comment about tokens per second being a "very silly metric" is hard to comprehend, it sounds like an F1 team saying the top speed of their car is a "very silly metric". It does seem the OP has a been a bit of an irritant in their discord chat, and bringing it here isn't very cool, but whatever... Anyway, I don't really care that much about any of this, I just read LocalLlama to take a short break from my coding.
Cheers
GCoderDCoder@reddit
Yeah I hear ya. I often get confused by down votes too when I dont expect sometimes. To me a down vote is someone being mean. Someone I disagree with I just respond to. I will say, when I started writing I remembered you aren't OP and by the end I had conflated you. I was in bed just waking up at the time but I'm taking from this that people often arent looking for details and nuance on here since it's informal. I waste more of my life resounding on Reddit so I'm trying to get better. I know the feeling so that's why I responded lol. Good luck on the next one lolol
RogerRamjet999@reddit
Yeah, it's all good. I care nothing about karma or whatever weird scoring Reddit has, but I knew the original post was just muckraking and I posted a response anyway. Probably should have just skipped it, but sometimes I can't help myself when something sounds wrong. Anyway, you sound cool, thanks.
Cheers!
Educational_Rent1059@reddit
Stop acting like a victim. "sounds like pulling a complaint out of your backside" maybe has something to do with the downvotes you receive. Btw the only thing wierd here is all these fake accounts with no history at all suddenly showing up here
AvidCyclist250@reddit
that's exactly the impression most readers are going to get by reading this ai slop submission anyway.
TheRealSol4ra@reddit (OP)
I’ll be the first to admit I’ve engaged in standard Discord banter and haven't always perfectly filtered myself in chat. If you want to use that to paint me as a "drama stirrer" so you can avoid addressing the actual issue, go ahead.
But my chat history doesn't change the screenshots above.
It doesn't change the fact that your team baselessly accused a brand new competitor of "literally cheating" and faking their data without a shred of proof. It doesn't change the fact that when they reached out to explain their PTQ methodology, you completely ignored it. And it certainly doesn't change the fact that you are still complaining in this very thread about a graph that ByteShape already fixed to accommodate your exact complaints.
You can attack my character to deflect, but the community can see the screenshots for themselves. You still haven't addressed why your team is falsely accusing competitors of cheating with little to no actual knowledge on the models.
yoracale@reddit
I agree that our communication was not handled in the best way. However, you have been extremely disruptive in our Discord server and you were harassing community members, so we responded more sharply in an attempt to try to dissuade you from escalating anymore drama. Clearly, that approach did not work.
KickLassChewGum@reddit
just outlining the shape of this answer out of a scientific curiosity, nothing to see here
-dysangel-@reddit
I know you think you're doing the right thing here, but you're not. Take a step back, relax, and listen to the feedback you're getting..
853350@reddit
this is condescending. his delivery is weird but he has a point.
-dysangel-@reddit
I don't think I'm being condescending. I used to be in a similar headspace to this guy when I was in my 20s. It's not good for you. I don't think he really has a useful point here - he's just stirring up drama and ostracising himself in ways that he doesn't need to. I know we're all autistic, but we can do better than that if we try.
Read the post from yoracale below to see that it's a pattern and not a one off.
KickLassChewGum@reddit
You're finding one pattern and missing another.
-dysangel-@reddit
Unsloth didn't publicly accuse anyone of anything. They just made reasonable comments on something a guy posted in their discord. A guy who is apparently not stirring up trouble for the first time, which is not surprising to hear after reading the main post.
If that's how you think yourself and others should act, feel free to be a whiny brat I guess, but I think it's embarrassing.
KickLassChewGum@reddit
Hahaha. You sure "snapped out" of it, indeed. Real mature now you are.
-dysangel-@reddit
By snapped out of it I mean that I don't go posting up public posts to try to shame people. I'm definitely not perfect by any means. You seem like a bit of a dick tbh.
KickLassChewGum@reddit
My rule is that I'll never throw offense first. If you act like a prick to me, I see no reason to coddle you. If you can't take it like you dole it out - and have been since the very first comment in this thread with your equally-pretentious-as-thinly-veiled life coach LARP - then at least save yourself the dignity and don't moan about it.
-dysangel-@reddit
lol
Kagemand@reddit
There wouldn’t really be any drama if the receiving end had met him with curiosity and not aggression. But the problem is people see this as a competition and get defensive.
It’s fair to be skeptical of untested results. But the description in here of what happened with the graph does not warrant an allegation of cheating.
853350@reddit
it’s his life and his choice, the condescending part of it is you thinking that he doesn’t know what he’s doing. i wouldn’t be doing it, but not everyone needs a sage
-dysangel-@reddit
(I think his behaviour is really embarrassing for him, but he probably doesn't even realise that, and just feels like he's crusading for justice)
-dysangel-@reddit
I'm happy with that. I had someone send me a similar message in my 20s and it really helped snap me out of it.
TheRealSol4ra@reddit (OP)
I appreciate the life advice, but I think you have the roles confused here. I'm not playing the victim, I'm just the messenger who posted screenshots of a public conversation. ByteShape was the team that was baselessly accused of cheating without proof, not me.
I am always open to constructive feedback on the tech, but accepting ad hominem attacks in order to ignore documented evidence of a massive team bullying a new developer isn't "doing the right thing."
I'm perfectly relaxed. The receipts are all right there in the post, so I'm more than happy to step back and let the screenshots speak for themselves.
Uhlo@reddit
Hm? Reads to me like the unsloth guys have a point! Comparing 1-bit vs 3-bit quants is not really fair!
TheRealSol4ra@reddit (OP)
Hey there! I completely understand why it might look that way at first glance, but that was actually the core of the misunderstanding!
ByteShape wasn't trying to do a direct 1 to 1 comparison of a 3 bit to a 1 bit. They were just plotting their entire family of models (from smallest to largest) alongside Unsloth's entire family so users could see the full spectrum.
The reason the comparison is actually fair and super useful for us is because the graph plots them based on their actual disk size and VRAM footprint. So even though ByteShape's model is a 3 bit quant, in the graph the Unsloth team was referring to it took up a similar amount of space as the lower bit models, but runs faster in t/s and holds higher accuracy. For end users like us, comparing models by how much VRAM they eat up rather than just their bit size is usually the most practical metric!
ByteShape actually ended up changing their labels from numbers to letters shortly after to make sure no one else got confused by it. Thanks for taking the time to read the post!
yoracale@reddit
Yes as I explained earlier, if their intention is to their entire family of models (from smallest to largest) alongside Unsloth's - so does that mean if they uploaded their smallest quant as 8-bit and our smallest as 1-bit, it's ok to label the legends as 1. vs 1.? Because that is once again very misleading.
When a viewer sees they they naturally think it's comparing apples to apples,
doomslice@reddit
I have no clue what Unsloth is so consider me a neutral observer. It’s pretty clear to me that they are just labels and not directly comparable. Otherwise what would the equivalent of 10+ be? Nothing to compare to?
I think the graph frames it pretty well too… the size of the model is represented by the size of the shape. AndTPS is absolutely what I care about in the “real world”. So plotting that against accuracy makes total sense.
yoracale@reddit
'The size of the model is represented by the size of the shape' - a proper graph would just label the exact quant size or name of the quant with the bit size however, in this case, it is very confusing even with the size of the shape. Maybe if you were a graph expert you would understand that but even then, it is extremely confusing. This is how you're supposed to do quantization graphs for labelling.
CheatCodesOfLife@reddit
I see you tested ubergarm's quants there. Does that mean you guys test with ik_llama.cpp?
Or did those 2 happen to be llama.cpp compatible?
TheRealSol4ra@reddit (OP)
You are arguing a hypothetical scenario to justify your reaction to a graph that already got fixed.
Whether you liked their original ordinal numbering system or not, the x-axis was clearly VRAM footprint—a completely fair apples-to-apples metric for local users. More importantly, when you raised this exact complaint, ByteShape listened and changed the labels to A, B, and C within 24 hours to accommodate you.
Yet you are still hyper-focusing on the formatting of an outdated graph. Why? To distract from the actual, severe issue here: your team publicly accused a new developer of 'literally cheating' and faking their benchmark data without a shred of proof, and then completely ignored them when they explained their PTQ methodology.
You can keep debating the bubble labels of a deleted graph all you want to deflect from your team's behavior. The screenshots are there for anyone who cares to read them. I've said my piece, so I'm going to leave it at that.
danielhanchen@reddit
I did KLD benchmarks - they are worse on 99.9% in the Pareto frontier:
ketosoy@reddit
That was my first takeaway too. “What’s the performance of the smallest,” “whats the performance of the matched bpw” and “what bpw gives the matched performance” are all important but different.
Bytedance appears to have done legitimate fontier work on variable bit width compression, and muddled the communication.
Nobody really looks like an asshole here, to me. This looks like a complex set of misunderstandings at the frontier of a legitimately complicated topic muddled by a bad choice of primary graph and omission of a contextually very important secondary graph.
I’d be a lot harder on the sides in this if I haven’t been on both sides of situations like this a thousand times in my life.
danielhanchen@reddit
Look at your 2nd screenshot in your own post:
I said "ByteShape's analysis is fully skewed - their x axis is tokens per second not disk space" then "They're literally cheating"
TheRealSol4ra@reddit (OP)
What is this then if I may ask?
tarruda@reddit
Unsloth used to be my first choice for GGUF until I found about Ubergarm and AesSedai, which generally provide better quants.
Then I started to notice something shady about them: whenever I mentioned any other non unsloth quants on huggingface, they would hide my comment as "off topic", which makes no sense because huggingface is the place to discuss this.
One example: In a huggingface discussion unsloth was saying that 256g machines could run their 4-bit quant, then I mentioned that ubergarm had released a 2-bit quant that worked well on 128g machines and they hide my comment. It is almost as if they reject any competition for creating quants, which is completely ridiculous.
TheRealSol4ra@reddit (OP)
Wow, thank you for sharing that. It is honestly incredibly disappointing to hear that this kind of gatekeeping extends all the way to actively hiding helpful comments on Hugging Face. You were literally just trying to help users with 128GB machines find a viable alternative, that is exactly what those discussion boards are for!
Speaking of AesSedai, I actually looked at their KLD retention charts recently and was seriously impressed by the quality of their work. They are doing fantastic things in the quantization space.
The fact that Unsloth would actively try to suppress mentions of excellent work by Ubergarm and AesSedai just to monopolize the conversation is wild. The whole point of open source AI is that we all benefit from having different options, sharing methods, and pushing each other to be better. Trying to silence any and all competition is just a bad look for the community as a whole.
CheatCodesOfLife@reddit
slop
TheRealSol4ra@reddit (OP)
What?
yoracale@reddit
In our GGUF benchmark charts, we included every community quant uploader and even highlighted cases where AesSedai and Ubergarm outperformed our own quants at certain bit levels. If we were really trying to shut out competition, why would we openly promote examples where their quants performed better than ours?
KickLassChewGum@reddit
It's incredibly irritating to see subtly disingenuous responses like this all over the thread.
For instance, here you're conflating community quants - i.e. independent talent that's of a far smaller commercial threat to Unsloth, or might even be a boon if they discover something promising - with one worked on by a company. These are not equivalent things, yet you're presenting them as such.
(and, no, no one is expecting you to share benchmarks where you get beaten by competitors. i'm just expecting you to stop implying that you do.)
KickLassChewGum@reddit
It's incredibly irritating to see subtly disingenuous responses like this all over the thread.
For instance, here you're conflating community quants - i.e. independent talent that's of a far smaller commercial threat to Unsloth, or might even be a boon if they discover something promising - with one worked on by a company. These are not equivalent things, yet you're presenting them as such.
tarruda@reddit
And why were my comments mentioning other quants being hidden?
mana_hoarder@reddit
"Part 1. The Spark."
AI slop post meant to stirr drama. Downvote and move on.
TheRealSol4ra@reddit (OP)
It took me an hour or two to write the post and gather all the evidence, scroll back through logs, get emails, I had Claude format it for me. I assumed this would have been easier and more preferable to read than a literal wall of text. I gain nothing from stirring drama. Duly noted though.
dnaleromj@reddit
Cant read that wall of text. Dont care what it says.
No-Refrigerator-1672@reddit
I will pick one point of the discussion so I can concentrate better. Tok/s, isolated, is indeed a silly metric for a quant. It does not matter how fast you run if you produce garbage outputs. What you should really focus is some kind of benchmark per quant size, so we can understand if the quant retains original intelligence; ideally - KLD. Or, if you insist on taking speed into perspective, then it should be tok/s/benchmark score at the very least. Measuring speed only makes sense for two quants who produce equally intelligent outputs. Also, it should be noted that token generation speed should never be discussed alone, and prompt processing speed is even more significant, that always gets overlooked.
danielhanchen@reddit
I did KLD 99.9%:
TheRealSol4ra@reddit (OP)
Hey! Thanks for the thoughtful comment. You make a fantastic point, and I actually agree with you 100% on the core premise: t/s in a vacuum means absolutely nothing if the model has been degraded to the point of outputting garbage.
However, the reason I defended the use of t/s here is that ByteShape didn't isolate it! If you look at the graphs they provided, the Y axis is "Average Accuracy" and the X axis is "Average TPS", with the bubble sizes representing the footprint.
So they were actually doing exactly what you suggested: plotting speed against retained intelligence. Their core claim was that they can maintain the same (or better) accuracy as standard quants, but run significantly faster due to the reduced memory traffic. They weren't just saying 'we are faster,' they were saying 'we are faster at the exact same intelligence level.'
Also, you are totally spot on about prompt processing speed. It definitely gets swept under the rug way too often in these benchmarks. I'd actually love to see both teams publish their Time To First Token (TTFT) and prompt eval speeds for a more complete picture. Thanks for engaging with the post!
No-Refrigerator-1672@reddit
I'm sorry to see your comment downwoted, as there's nothing wrong with it. You're right; I found their blogpost, went over it, and yep, the graphs indeed diplay performance versus intelligence. I would also very much like to see intelligence versus size, but hey, at least their data seems reasonable. Also, I believe you should add a link to the blogpost by byteshape into the post, so it would be easier to refrerence. I'll now proceed to reading other comments.
TheRealSol4ra@reddit (OP)
Thank you so much for taking the time to actually read through their blog post and look at the data objectively! The downvotes are what they are at this point (the brigade hit pretty hard), but I really appreciate the level headed response.
Their primary goal with those specific graphs was to highlight that speed to intelligence retention, which is a super practical metric for those of us running these on consumer hardware. But I completely agree with you, adding a traditional 'intelligence vs. disk size' or KLD graph would give a much more complete picture and help bridge the gap for people used to the standard formatting in this space. I'd love to see them release that data soon.
And that is a fantastic suggestion about linking the blog post. I'm going to edit the main post right now to include the link so people can easily find their methodology and read the raw data for themselves without having to dig.
CheatCodesOfLife@reddit
Your framing of Unsloth being "unprofessional" compared with ByteShape is unfair. You went into Unsloth's discord server (a casual chat platform) with "Look at the discrepancy for the lower quants, jesus". Then you contacted ByteShape via email, likely in a friendly / professional manner.
danielhanchen@reddit
Hello I checked and ran KLD 50%, 99.9% and PPL - they definitely are worse on KLD 99.9%: I checked all quants from all our repos:
Our Discord discussion has more plots on 50% KLD (even worse for them), TG128, PP512 etc
AvidCyclist250@reddit
Holy shitpost, this saga is sus as fuck. Are you baiting investors? Seems so.
tmvr@reddit
Everyone, please go ahead and help yourselves:
doomslice@reddit
Something IS pretty suspicious here though. OP reply got -15 in 10 minutes. For a small sub like this that almost certainly means it’s brigaded (probably someone linked in the discord?)
Randomdotmath@reddit
'small' sub bro
doomslice@reddit
My own reply has 50 views in 43 minutes. There is no way that 15 downvotes (to a reply 2 levels down) in 10 minutes is organic.
TheRealSol4ra@reddit (OP)
You hit the nail on the head. I was actually sitting at around +30 upvotes, and then it plummeted into the negatives in under a minute.
Mike (one of the Unsloth devs) is actively commenting in this thread, so it's practically guaranteed this was linked somewhere for a brigade.
Instead of actually addressing the screenshots where they falsely accused a new team of cheating, Mike's response here was just to slander me. He deliberately phrased his comment to say I "was" a member of their Discord, trying to paint me as some toxic, banned user out for revenge. Meanwhile, I am literally still in their server right now having a normal conversation with Daniel (His brother and other unsloth dev).
If you look through the chat logs, Mike has been the hot-tempered and abrasive one driving this hostility from the start. It’s a lot easier for them to just rally their massive community to mass-downvote the post and attack my character than to actually take accountability for how they treated a new developer.
Kahvana@reddit
Eyyy thanks for the batch! It’s been a good day so far for drama. First a “systemically broken attention” for gemma 4, now this.
e7615fbf@reddit
Oh don't worry. We won't. You are clearly the one harassing them, so if you could please stop with this delusional crap, we'd all appreciate it.
ridablellama@reddit
dude don’t go into people discords stirring up shit. weirdo,behavior. no one is reading that nonsense
Amblyopius@reddit
At first sight, in those graphs the actual t/s at similar accuracy seems to be at best never more than a 20% improvement. In some cases even the entire span regardless of accuracy is hardly 20%. That's cool but not exactly earth shattering.
If inference speed is what keeps you awake there's far more interesting things to discuss/chase (e.g. DFlash).
That isn't to say that the higher bit ranges seem to provide a minor improvement in accuracy + a minor improvement in speed. If a bit of independent testing supports that, then obviously there's good reason to use the models as even minor upgrades are great if they are free. Of course as long as you're using one of the few models they have released (which I would see as the core difference between them and Unsloth for now).
ArtArtArt123456@reddit
it's not really moving the goalposts if he found issues and they actually fixed those issues, right? at least when it comes to the visualization/presentation issues.
probably less so with the QAT/PTQ issue, but i can't say i understand it enough to judge.
TheRealSol4ra@reddit (OP)
I totally get why it might seem that way, but the "moving the goalposts" actually happened after the fix!
Here is the exact timeline of why the graph issue was so frustrating:
That is the exact definition of moving the goalposts, changing the requirements for what constitutes an acceptable graph the moment your original demand is met.
As for the QAT vs. PTQ issue, here is the simple, non jargon version!
Unsloth threw out a massive accusation of fraud with zero proof, and then completely ignored the developers when they brought the technical receipts to prove they were doing it the fair way!
AurumDaemonHD@reddit
TheRealSol4ra@reddit (OP)
https://i.redd.it/nkarj6pkvxug1.gif
korino11@reddit
Dirty game from unsloth. LOL All comparison is VERY clear - https://byteshape.com/blogs/Qwen3.5-35B-A3B/ Imagine idots that you can even put a cursor on every explanation and there will be a whole story +parameters. what an idiots from Unsloth. And ofcourse down vote now will be from stupid users that cannot think.
bonobomaster@reddit
I recently tried their Qwen3.5-35B-A3B Q3_K_S version and I'm very impressed.
I can fit it easily in my 16 GB 5070 TI and therefore it's blazing fast (120-130 tk/s) and it really doesn't feel like Q3.
Need to use it more but right now this is by far my favorite quant.
TheRealSol4ra@reddit (OP)
That is exactly what I was experiencing too. I've been hitting around 160 t/s on my end, and the intelligence retention for the size is genuinely impressive. I cannot wait to see what they do with their 27b quant when it drops.
And honestly, despite all the heat in this thread, I am absolutely going to keep using Unsloth's quants as well. I still love their models and have a massive amount of respect for what their team has done for the local AI space.
My entire goal with this post was never to 'cancel' anyone or tear Unsloth down, it was just to bring some technical clarity to a brand new team's work and ask for a bit of professional accountability when baseless 'cheating' accusations get thrown around. There is more than enough room in this community for both of these teams to push the envelope, and as end users, we all win when they do!
zilled@reddit
OP, you don't understand how things work in Open-source.
If you're not happy with what is being done, you fork or pick another solution.
This is even more effective as the field is currently blowing up. I.e. there is no way for one actor to to have a monopoly and spoil the place.
In your post, many things are stated as facts, but a purely speculative/allegations:
* "They dismissed the technical facts", there are arguments, not facts.
* "their recent conduct towards a brand new team needs to be called out", the need to be called out is for us to assess, not you.
* "are bringing real innovation", like Unsloth doesn't ?
* etc.
The only mistake of Unsloth is to not have applied their Code of Conduct more firmly on you.
Velocita84@reddit
Byteshape? You mean the company trying to peddle a proprietary gguf recipe algorithm to bait investors? Yeah get the fuck outta here
Advanced-Picture5016@reddit
hm i don't think i can really bring myself to care
unsloth -> who? oh yeah i think i downloaded a gguf from there on hugginface
byteshape -> who? no idea lmao
simple as
Odd-Ordinary-5922@reddit
still cared enough to comment
Advanced-Picture5016@reddit
trite reddit tier answer
quantum_splicer@reddit
Pretty standard expectation in science and when it comes to comparisons in the technical sense, That in order to draw proper conclusions the two things being compared have to be nearly identical except one variable. If multiple variables differ between two things it introduces noise.
If two models are being compared in performance, then if they are both different quants then that isn't really fair. In respect of other changes made to the models which we are then applying analysis to see if our methodologies result in increased model performance, then that is fine because that is the thing we are evaluating.
(1) Whether the applied methodology X increases model performance than methodology Y
(2) When all other factors are equal.
In respect of the unsloth stuff -
I don't know I can't really comment, I know they output alot of good quality work and have been consistent in that respect.
I think they are correct to point out methological issues and inaccuries where there is chance a reasonable person could be mislead even if inadvertently.
I think there is greater need for vigilance in examine others work given the use of AI tools and pressures groups have in outputting work to stay relevant.
I think each group working on model refinement and such should dedicate space to discussion of other models from other groups + different methodologies groups apply. On the basis it can serve as source of inspiration and when ideas are allowed to perchlorate, it allows opportunities for individuals to tweak and experiment and improve or diverge or learn from others work.
audioen@reddit
I looked at these charts and I have literally no idea what is being compared against what. The minuscule pixel size doesn't help, but from what I can tell y axis is "accuracy" and x axis is "tokens per second", likely generation speed.
A more normal metrics might be e.g. disk size vs. K-L divergence, which at least usually shows the usual frontier where unsloth quants typically represents the top choice, and is explicitly the target they optimize for. I don't think byteshape is bringing anything to the table but misleading charts unless there is better data than this. Tokens per second and a fairly saturated accuracy doesn't provide very useful charts.
TheRealSol4ra@reddit (OP)
Hey, totally fair criticism on the image quality and readability, the compression on those Discord screenshots definitely doesn't help.
You bring up a great point about Disk Size vs. KL divergence. That is absolutely the gold standard for measuring pure quantization degradation, and Unsloth has undeniably mastered that frontier.
However, we have to keep in mind that ByteShape is a brand new team that literally just launched. The reason they plotted t/s vs. accuracy is because they are highlighting a system level benefit: since most LLM inference is memory-bound, their specific PTQ method heavily reduces memory traffic, allowing for incredibly fast generation speeds (I was hitting 160 t/s) while maintaining solid accuracy. For a lot of end-users on consumer hardware, real-world speed and a low VRAM footprint are incredibly practical metrics to look at alongside standard KLD.
Are their charts the industry standard right now? Maybe not. But for a team that just arrived on the scene, they’ve already proven they are highly receptive to feedback (updating their graph labels in less than 24 hours when asked). I'd love to see them publish KLD graphs in the future too, but I don't think we should completely write off their underlying tech just because their first batch of charts was formatted differently!
Few_Painter_5588@reddit
OP, this is pathetic.
Holiday_Purpose_3166@reddit
First, going on a business social space to post about other business is a terrible move.
They are entitled to their opinion in their space. I'd be more concerned they would go out bashing other tuners proactively.
Secondly, you went defensive mode about Unsloth's response by engaging with Byteshape back and forth. There was no need for it.
I like both teams and also use their quants.
Your engagement was worst than Unsloth reply with all due respect, and wouldn't trust someone taking screenshots of a convo you sparked to make bait farm.
Let results speak for themselves and leave the monkeys in their circus.
TheRealSol4ra@reddit (OP)
I appreciate the feedback and can definitely see where you're coming from. Looking back, I get how posting another team's model in their Discord could rustle some feathers, even though my only intention was simply sharing a cool new development with fellow local LLM enthusiasts, people share alternative tools in there pretty often.
The main reason I engaged further and reached out to ByteShape was because the reaction from Unsloth wasn't just "we don't like how this is graphed." It quickly escalated to them publicly accusing a new team of "literally cheating" and faking data. Because they are such a massive and respected voice in the space, those kinds of accusations carry a lot of weight. Since they refused to contact the developers to verify those claims, I felt it was only fair to ask ByteShape directly so the community had the actual technical facts.
I definitely didn't want to start a circus or a bait farm, and I agree with you 100% that results should speak for themselves! That's ultimately why I wanted to share ByteShape's technical response and the timeline, so people could see the actual methodology and make up their own minds rather than relying on unverified accusations. I will definitely keep your perspective in mind about how I navigate these spaces going forward, though. Thanks.
Ambitious-Profit855@reddit
Unsloths communication seems unfriendly if not hostile, I agree. But the usual Metric is Accuracy vs Space, not TpS. That + no information of the underlying inference engine, context length etc makes it a claim not verifiable proof.
Exact_Law_6489@reddit
I have to say, its genuinely impressive that a brand new team responded with a professional, detailed technical email and updated their graphs within 24 hours just to address complaints from a competitor who refused to contact them directly. Meanwhile, the established team with the massive following is saying, "But why would we talk to them? They're just a company" and accusing them of cheating with zero proof. I guess real confidence shows itself differently.
ravage382@reddit
Maybe a summary up top. No one wants to read such a long AI generated post.
Luke2642@reddit
Up voting for your effort Op, too early to judge anything else for me, but shining a light on it is good work.