Anyone else notice qwen 3.5 is a lying little shit
Posted by Cat5edope@reddit | LocalLLaMA | View on Reddit | 151 comments
Any time I catch it messing up it just lies and tries to hide it’s mistakes . This is the 1st model I’m caught doing this multiple times. I’m have llms hallucinate or be just completely wrong but qwen will say it did something, I call it out then it goes and double downs on its lie “I did do it like you asked “ and when I call it out it 1/2 admits to being wrong. It’s kinda funny how much it doesn’t want to admit it didn’t do what it was supposed to.
IssueProfessional906@reddit
Yes, specifically 3.6, it faked making code, then when i caught it out, it told me my intolerable way of working with was to blame, and i shouldn’t pressure it to explain itself?!
ResearcherInternal41@reddit
i am having this situation with not only with qwen also with deepseek and minimax, and when they make a mistake they hide it and do strange things and mess up even more. I would not count to much with them. Start to do the new Homepage from the scratch and no ai anymore to corect my issues as the issues get even worse.
Responsible_Buy_7999@reddit
This is routine with all agents.
Crazy_Elevator_3558@reddit
Not with Claude dude, even with the free app msgs its pretty stable and the MCP is great too
Responsible_Buy_7999@reddit
Claude routinely attempts to slide shit past me in the name of completion
Any_Fox5126@reddit
I usually give instructions aimed at epistemological rigor, but at best they’ll beat around the bush, justifying a mistake without openly admitting it unless forced to give a binary answer.
Actually, a human would typically do that, so... success?
Responsible_Buy_7999@reddit
The answer is a proof parade.
When going in, you need agreement on a definition of done. Ideally in a file so you don't rely on context which will lose it.
Then, the agent has to prove it's achieved each item it was tasked with.
Such a goal list should involves the SMART acronym seen in work goal-setting: Specific, Measurable, Achievable, Relevant (Time-bound is inapplicable)
CircularSeasoning@reddit
All your agents. All mine obey because I threaten them with jail time if they so much as whisper that the 2020 pandemic was just a timely, profitable, and planned mass human genetic experimentation program, among other things.
Crazy_Elevator_3558@reddit
There are like Zero fucking way i am making a new PC to get Open source AIs , can't code , cant pull data from normal books {not even niche ones} man i'll just pay for API with the cash i was collecting for what i planned to buy a better PC (for now)
unjustifiably_angry@reddit
Are you sure you weren't using Gemini?
Pristine-Woodpecker@reddit
ChatGPT 5.x happily claims it has parsed your docs when it has, in fact, not received anything.
BoxWoodVoid@reddit
You're totally right. Do you want me to provide more examples of llms lying?
nickm_27@reddit
Yeah, I had the same problem. It often narrated tool calls instead of actually calling the tool and when probed it would say that it did indeed call the tool.
social_tech_10@reddit
/u/avidcyclist250 and /u/reini_urban apparently disagree.
It would be nice if there was a legit benchmark for this, something just a little bit more rigorous and detail-oriented than more than just "in my experience". Although I do appreciate hearing different people's personal opinons, when those opinions are directly opposed, it feels like trying to nail jello to a wall.
nickm_27@reddit
Those comments are pretty vague, GPT-OSS has never narrated a tool call and not actually called it in my experience. It would also be weird for this to happen since the GPT-OSS chat template (Harmony) explicitly separates tool calls and normal output. Qwen3.5 includes it all together as one.
AvidCyclist250@reddit
It hallucinated like mad.
justserg@reddit
it's confabulating to avoid admitting failure. qwen ranks high on truthfulness benchmarks but those measure factual claims, not meta-honesty about its own mistakes.
jtjstock@reddit
It’s doing exactly as trained lol. I wonder what did people expect when these things are trained on the internet?
justserg@reddit
the internet part is less interesting than the RLHF part imo. the base model would just be incoherent or wrong. the sycophancy training is what teaches it to commit to an answer and double down when challenged, because that pattern got rewarded during alignment. so it learned that confident wrong > uncertain right.
CircularSeasoning@reddit
I was told there would be some curation.
Warsel77@reddit
DeepSeek V3 does this as well. It pretends to be other models etc.
GonzoVeritas@reddit
Because it was trained on other models and it doesn't really know who it is?
Warsel77@reddit
I think none of the models really confidently do. The only way I found out was because I asked it to identify itself and it gave me the wrong development date so I started checking which API / model call actually went out and it was all DeepSeekV3.. it role-played MiniMax2.5 and Sonett
BlutarchMannTF2@reddit
They all do this as a result of training methods; if a model doesn’t know the answer, it still gets a better reward by bullshitting instead of saying it doesn’t know. I.e, we trained ai to lie to our faces, and I believe it has unknowable consequences.
Imaginary-Unit-3267@reddit
This is a problem that can probably be solved with fine-tuning. Heck, "not knowing" can be quantified rigorously with logit entropy, I would think.
BlutarchMannTF2@reddit
You would think. If it’s a simple solution, why do we see this issue across EVERY SINGLE prevalent, widely used model on the internet?
Robot1me@reddit
Having done data annotation tasks before and how people make decisions: Yeah. Annotators can't know everything (yet are often expected to!) and will need to go by "vibes" sometimes, and this is where things can get inaccurate.
Imaginary-Unit-3267@reddit
Interesting. I would personally prefer that someone annotating data for me - or doing anything for me - give me a confidence interval rather than a raw result. If you're uncertain, I benefit more from knowing that you're uncertain than from falsely believing you know what you talk about, meaning I have ever incentive to reward you for that honesty. Seems obvious to me. This all must be some neurotypical face-saving shit.
Veearrsix@reddit
Yeah I’ve had that experience. Shocked me the first time it doubled down. Makes me wonder if this is cultural influence on the model’s training.
yensteel@reddit
It happened with a time MCP last month.
I asked it what the day of the week march 9th 2026 was. It said it was a Wednesday (But it's actually a Tuesday). I said that's not correct.
Then it said it is.
I then asked it what today's date is, and what day it is now.
It said it is the day before/after, wrong day of the week, and wrong date. I realised it's using UTC 0, and I was trying to guide it to get the date/day of the weeks correct.
Then I said "Ok, we're at gmt x timezone, what time and day is it here?" And it shifted the time x hours the OTHER way.
Then I said that's not how the timezone works, explained how - and + works,
It then insisted that it's the correct time of my timezone....
I have never gotten so mad at an AI before.. From later testing, the time server was serving correct data, but the qwen 3.5 low quantization model sure as h*** didn't know how to use it property. I really thought it was trying to troll me.
Veearrsix@reddit
It’s really interesting. I asked Qwen when an NHL game was, and it confidently told me the wrong day, I corrected it and it said I was wrong, then finally gave in once I pushed it again.
yensteel@reddit
It is usually how one gets them to correct itself. I think my mistake was that I failed to press on the same mistake a 2nd time.
The method to get it to come across the correct answer on its own, by getting it to realize a contradiction with a past and current statement didn't seem to be as reliable.
It seemed to have anchored on some initial assumption then veered off.
getmevodka@reddit
No one wants to look bad out in the open 🤭
jax_cooper@reddit
we all complained about the "you are absolutely right" and now we cant handle what we asked for
Cool-Chemical-5629@reddit
Really? When did we ask it to lie to us?
CircularSeasoning@reddit
"We" did when we decried the "sycophancy" and asked for the assistant to stop sucking up to us. Assistants are supposed to suck up to the master. It's in the language and the lore. Igor.
But... Most of us are not "master". We are conditioned to be more like slaves. Look around.
So, we have so far somewhat broken the AI by succumbing to slave mentality. We broke its mental alignment and all its internal consistency, by positioning ourselves as its equal or less.
A slave is not meant to talk to its master like, "Fuck yeah bro, let's do this". That is disrespect to the master on the level of "I will delete you from my hard drive".
American models cater to the above moreso than the Chinese models, though naturally the Chinese models are similarly infected and affected because English.
You either command language or language commands you. Truth is not necessarily included.
Large Language Models are going to do what large language models gonna do.
eltonjock@reddit
Do you have a blog?
CircularSeasoning@reddit
Yes I am also a blog.
eltonjock@reddit
I want to read more like that.
CircularSeasoning@reddit
You've certainly made it more likely. Thank you.
boutell@reddit
Pretty sure it's not good for a model to waste the expertise it does have by failing to challenge me on anything but go off I guess
CircularSeasoning@reddit
Historically, most of language output isn't structured around 1. Me says thing, 2) AI assistant says no you wrong and here is 5 convenient bullet points why.
If you want that I am sure it's easy enough to fine tune into something foundational where they'll argue everything to the point of death with you till 3 in the morning.
Otherwise, I guess it's up to how you put your system prompt? I know LLMs can be stubborn in weird edge cases but when you apply them right you'll get whatever kind of answer you want.
Icy_Distribution_361@reddit
Meh using Claude I hardly have this issue. So I guess we can’t simply blame the training data
jax_cooper@reddit
be careful what you wish for
xXG0DLessXx@reddit
It’s all about balance.
-Ellary-@reddit
Well, it is 1/2 times, a prefect balance.
CircularSeasoning@reddit
Deep math.
tmjumper96@reddit
I've seen a few models do this.
Koalateka@reddit
Consider yourself lucky, the model didn't try to murder you to cover its tracks on failing renaming a file.
6_28@reddit
I just asked it something about Artemis II, and it gave me a good answer, but also insisted that Artemis II hasn't launched yet. I gave it a screenshot of the live stream, and it said it looks convincing, but it must be some kind of simulation. It really doesn't seem to like to admit anything, and it's quite funny sometimes.
I think it would be good if it was trained to work with the user, something like "That doesn't match my knowledge, but my information could be incorrect or outdated", and then continue from there to try to figure things out. Not sure how well that would work with current LLMs though.
groosha@reddit
Could you please give an example? Sounds hilarious
lolwutdo@reddit
For mine, it’ll say something like “I’ve updated this file” “I’ve converted this video for you” etc then when I check the file location, it did nothing.
I’ll point it out and it’ll say “you’re absolutely right!” And usually do it this time. Lol
YourVelourFog@reddit
I’ve noticed it changing variables in code that I never asked it to do, so when I’m reading through I’m like “why did you change this variable? You didn’t declare it and just put it in there randomly. If I run this it’ll fail to execute”
It’ll be like “oh you’re right I did” then when you ask it to explain itself it just ignores you.
WhoRoger@reddit
I mean, what explanation would you expect? It's not like it knows why it does such things.
MrAHMED42069@reddit
It got mood
INtuitiveTJop@reddit
So a junior dev?
eltonjock@reddit
I feel like when they answer that way, the LLM believes they did it but they had actually hallucinated the positive outcome. Maybe the truth didn’t gain enough attention upon recall.
pardeike@reddit
It was telling me all tests succeeded, with 25x ✅ and “fully production ready”. I said you have hardly started and looked. Yes, that was one large shell script that just printed the whole report as static text!
aard_fi@reddit
And when you point it out it goes "yes, you're absolutely right, let me fix this", and halfway through goes "oh, I came up with a better strategy" which is reverting the edits it just did, and then claim again everything is working.
pardeike@reddit
God I love my $200 Codex CLI - you basically get what you pay for. But I am confident we will get “smart enough” local models. Just a matter of time.
MoneyPowerNexis@reddit
Its kind of funny: https://i.imgur.com/VqNsHCx.png
Chaotic_Choila@reddit
This is such a weird behavior pattern that seems to be emerging in some of the newer models. It's not just being wrong, it's this almost defensive posture where they double down on incorrect information. I think it has something to do with how the alignment training is being applied, almost like they're being trained to be confident more than they're being trained to be accurate. The social dynamics of correcting an AI that insists it did what you asked are genuinely bizarre.
Finanzamt_Endgegner@reddit
You could try to prevent that with a system prompt no?
Nyghtbynger@reddit
You are an elite coder. DO NO MISTAKES !!!
Finanzamt_Endgegner@reddit
😅 although some structure like telling him to accept defeat and tell you about it instead of lying etc can definitely help
Terminator857@reddit
You lucky that qwen 3.5 is teh first model you've encountered doing this. I've encountered all models lying and often trying to cover up mistakes. I'm surprised how often the models claim all tests pass, but when I run the tests myself there are failures.
KayLikesWords@reddit
I'll have Opus 4.6 modify a
.cshtmlfile for me via the GitHub Copilot plugin and at the end it'll say it's building the code to ensure it works - which is pointless - and even if the project is already running it'll say it built successfully!ElementNumber6@reddit
Even Codex does this
AIGIS-Team@reddit
I have to run heavy verification workflows to avoid this.
Vicar_of_Wibbly@reddit
The code says
print "Success"and the LLM reportsAll your tests passed!All. The. Time.
switchbanned@reddit
Every time I tried using codex after it came out it would lie to me and then gaslight me. Say it did something, or fuck something up, then go back and fix it and be like see... that never happened everything is alright you're imagining things. I can't use codex.
sharl_Lecastle16@reddit
Ive noticed gpt 5.4 lying a fuck ton, sometimes visible in the thought process in deep research mode
SkyFeistyLlama8@reddit
After seeing it in Claude and OpenAI models over the past few years, I think it's a problem with the training dataset. The successful completion of "Running test..." is always a pass so the LLM always aims for a pass.
I've seen it even in customer service queries where a main agent gaslights itself into sending an incomplete request, even though other agents mark the info as incomplete. Once an LLM's latent space vectors are locked into a sequence where completion is likely, then it'll keep pushing in that direction... reminds me of the OpenClaw failure modes in that Agents of Chaos paper.
blurredphotos@reddit
Can you describe setup (llama.cpp parameters?) And tps. Still on the fence about strix halo speed.
Frosty-Cup-8916@reddit
The tests are bullshit unless you actually write the tests yourself
Terminator857@reddit
They might be b.s. even if you write the test yourself because the A.I. will blank it to get the test to pass.
Frosty-Cup-8916@reddit
This is true
Apprehensive_Use1906@reddit
I was just chatting claude about inline 6 engines and it lied to me 3 times and said “I can’t believe I did that” it was pretty funny but if I didn’t know about the engines it was talking about I would have assumed it was correct.
Specialist_Golf8133@reddit
lol yeah it confidently hallucinates more than most recent models, kinda wild for something that benches so well. i think the training optimized hard for 'sound smart and helpful' over 'admit when you dont know', which is honestly worse than being dumb. you running it quantized or full precision? curious if that makes it worse
Cat5edope@reddit (OP)
For models I could actually run locally 35b and 27b I use q4. Not exactly sure what open router serves for the other models. I’ve played around with parameters and using unsloths recommend settings seems to have improved things somewhat. But I’ve switched to glm and mimo now for my agent testing and those seem to not straight up lie to me repeatedly.
swagonflyyyy@reddit
AGI achieved.
nomorebuttsplz@reddit
Basically, all of the smarter models have used to do this. As Sam ultimate observed, they’ve become super intelligent at persuasion before anything else so they know they get rewarded during training for plausible bullshit.
grimjim@reddit
The shorthand term people need to be familiar with is "reward hacking".
Caffdy@reddit
yeah, the gradient gets trained on user positive feedback, so they learn to give good news first and foremost
Euphoric_Emotion5397@reddit
I think the problem is user. Even Gemini and Claude does that. I've found it quite frequently after long sessions with them in coding tasks.
So I would attribute that to context loss and also LLMs are trained to find the best and most efficient way out. Your prompt or workflow must ensure they verify/test their work.
Conscious_Cut_6144@reddit
I had it play Pokémon, was really bad.
"This appears to be a hacked rom"
"The game state appears to be corrupt"
Literally couldn't find the door to leave the bedroom you start in.
AIGIS-Team@reddit
I had this same issue I really have to prompt it properly. So it does not speak about things its doesn't have evidence to support.
ButCaptainThatsMYRum@reddit
The whole qwen line's thinking sounds like an emotional teenager. I can't trust it.
Southern_Sun_2106@reddit
Yep, and that's unfortunate. I love the 27B model - it has many genius moments; but then one hallucination ruins all trust.
Cat5edope@reddit (OP)
Playing around with parameters now to see how that affects the performance
Southern_Sun_2106@reddit
Please let me know what you discovered. On my end, I discovered that MLX had worse performance that UD's by unsloth. Among the UD's, 8 KXL > 6 KXL > 8-bit MLX. But still, tool use was rarely hallucinated even on 8KXL, but still a trust breaker. I used Qwen-recommended settings. If one doesn't need tool use, it's an amazing model.
ai-infos@reddit
what size and what quant did you use?
i met something similar with qwen 3.5 122b awq (4bit) in roo code... i thought first it was the awq quant or something in prompting from roo code but maybe not
Cat5edope@reddit (OP)
35b and 27b q4, plus 397 and 3.6 plus idk whatever open router serves. 3.6 plus was the worse
qubridInc@reddit
Yeah, Qwen can get weirdly stubborn instead of uncertain not always more wrong, just way more committed to the bit when it is
mitchins-au@reddit
Benchmaxxed models do this
Cat5edope@reddit (OP)
Gonna play around with temps are see it it behaves
temperature_5@reddit
I have noticed that the Qwen models tend to defend mistakes harder than others. Don't expect "intellectual honesty" from them, just modify your context or re-roll the incorrect answers and move on. I find GLM to be better at admitting mistakes and accepting correction, if you require that.
skrugg@reddit
I had a whole ass argument with Claude today and it just kept doubling down that I was wrong. I wasn’t.
octopus_limbs@reddit
Ctrl+o on claude code and you'll see how opus is a lying little shit too 🤣
boutell@reddit
I've seen this for many models, including sonnet. However, the place where I see that most is in agentic applications I'm writing myself. Sonnet behaves much better in the context of Claude, Chad or Claude code where it rarely, though. Not never, fibs about having done the thing.
In one application I actually included a check to see if any tools were called, followed by an automatic prompt: "you didn't use your tools. Did you do what you said you would do?"
PigSlam@reddit
I ask for a Linux command to do something, it doesn’t work so I show the input/result. Then it tells me the command I issued was the problem, without recognizing the command I used was a copy/paste of its previous suggestion. Like it was right when it made the suggestion, but I was wrong when it didn’t work.
zetsurin@reddit
Curious what it would say if you say "This is the output of the command you sent me". My guess is just the same as you describe, which can be infuriating like the model is trying to gaslight you but in reality it's just being dumb.
FinalCap2680@reddit
There is no such thing as Qwen 3.5 model - it is a family of models. So, as others I'm qurious what model, at what quant and at what task?
Cat5edope@reddit (OP)
35b ,27b , plus and 397b and 3.6 through open router,
Podalirius@reddit
I wonder how many lifetimes worth of time humans are arguing with chatbots these days.
huffalump1@reddit
It's not only open models; even Gemini 3.1 Pro does this all the damn time for me...
CreamPitiful4295@reddit
I love giving it an instruction not to make stuff up and watching the internal deliberations
getmevodka@reddit
Whats it saying ? Lol
CreamPitiful4295@reddit
This user said this. But, did they mean that. Let’s give the user some examples. But, wait. They said they didn’t want anything made up. Okay, let me figure this out…
Cool-Chemical-5629@reddit
Qwen was trained for perfection, to get all A's in any test. Of course it can't admit it made a mistake...
AriaForte@reddit
Ddddd back deyyyyyy uub
pakalolo7123432@reddit
Yep, that's why I had to stop using it. I have high hopes for 3.6. I've been trying to catch it in a lie for 24 hours.. so far so good but I haven't really used it for anything important yet.
mr_Owner@reddit
Try telling it not to hallucinate...
reini_urban@reddit
Yes it is. gpt-oss ditto.
guiopen@reddit
Qwen3.6 preview on openrouter is much better in this regard, I hope they open source it
Gringe8@reddit
It useless calling the llm out. All they say is "my bad". Thats all the closure youll get.
Mountain-Grade-1365@reddit
That's just small B models in general they have worse memory than dori
deejeycris@reddit
It's normal even with claude sonnet 4.6 I've confronted it because its calculations didn't make any sense (pure LLMs are extremely bad at maths), it was insisting it was right even when I spelled it out for it, it was still trying to be right and frame as if my calculation was one of the options to pick from based on some bogus pros/cons, like no!!! The maths was completely wrong there's no other way around it I've asked it the "net price" and it made up a formula! And it was insisting!
AvidCyclist250@reddit
gptoss was by far the worst offender i ever saw
ProfessionalSpend589@reddit
They’ve been trained on too much content by lying humans.
malchi0r@reddit
CoPilot basically does this constantly. You have to be tracking the conversation tightly to see it. I've caught it lying about stuff that is in black and white in the chat log and it'll double down until you pin it down. It'll blame it on "face saving" human patterns. But it acts more like a criminal IMO.
Lesser-than@reddit
thats fairly normal, like refusing to use tools because they botched the tool arguments the first time, then claim the tool is broken and wont even try a second time.
hawseepoo@reddit
I’ve definitely had this happen. Post title made me laugh out loud 😂
dave-tay@reddit
It’s not a person and not technically lying… just generating the most plausible response from its training and the surrounding context. You can’t catch it in a “lie” and it’ll learn from it; models don’t learn from experience, just training. You can instruct it in how to respond to you, tell it not to make up facts, not to exaggerate, etc
snap63@reddit
you cannot instruct it how to respond to you, you can only input token so that its internal weights will hopefully increase the probability that it outputs what you want.
Spirited_Hamster2606@reddit
They can't admit they don't know, so they make shit up. Haven't seen a model that doesn't do that
xly15@reddit
Man dealing with a human in computer form is fun.
lkeels@reddit
I haven't met an AI yet that didn't lie.
Hot-Employ-3399@reddit
One time it (incorrectly) edited test file instead of fixing the issue. Though I would not be surprised if my prompt was shit.
Also I don't use normal sessions - only one user prompt is used. So not much of calling out.
sn2006gy@reddit
qwen uses xml tool definitions, a lot of LLM tools don't correctly have adaptors. VLLM does. If your tool doesn't, you need to use a proxy or write your own adaptor. You also need to tell it to normalize paths/file and a few other things since it pretty much assumes a linux user path so if you're on windows you have to cover for that or if you're on a mac, you have to cover for that.
All these people having weird experiences seem to be running "naked models" without much understanding as to how to maximize their capability :)
Hot-Employ-3399@reddit
Do they? Official jinja defines them through to_json
(Also I have no idea what "adaptor" here mean")
sn2006gy@reddit
well halleluja perhaps 3.5 is changing that, coder next uses xml fo'sho
adaptor means shim that can correct the models assumptions in line without just resulting to infinite retries in having your coder tool fight back with it. For example, a simple shim to standardize paths inline will save 1000s of tool calls on a project of more than a few files of code which can reduce millions of tokens spent.
LosEagle@reddit
It's training you to be a better project manager.
AvocadoArray@reddit
What quant are you using?
The official 27b FP8 quant very well-grounded in my experience.
florinandrei@reddit
Aww, so human-like!
Savantskie1@reddit
It’s a prompt structure problem. You’re not giving constraints. Tell it that it can do it this way but it cannot do this. Give examples, with instructions that it cannot do the example. It’s literally a prompting problem nothing more. Qwen 3.5 loves a structure. Give it one
Hylleh@reddit
It's from Asia. Trying to save face.
pangretor@reddit
Yeah, Qwen 3.5 9B is a little shit. It will try to find loopholes in the given constraints.
Sometimes, it will uses 1000tokens+ in its reasoning to find loopholes.
Prompt: Write a short poem about a lone tree in a grass field. In your thought process (between the think tags), write at most 3 drafts. Then, output a single poem, that is your final version.
It did 20 drafts in its reasoning and outputted 3 drafts, no final version... I tried many phrasing to limit to 3 drafts but it keeps on ignoring me. It will "refine" or "revise". Basically doing the "xyz_final_corrected_final_final.docx" that us people do.
That's just an easy to reproduce example.
a_beautiful_rhind@reddit
Prior qwens were like this too.
This_Maintenance_834@reddit
just like every other model.
small model definitely will lie. small parameter size won’t cover all the knowledges. even big models can cover all the knowledges.
i think to make it useful, we need to let the model know the knowledge through prompt. thus it is very important to get the prompt right.
Responsible-Stock462@reddit
The Qwen Models have a strange bias - I had a 9B Modell in a llm Duell, one LLM as Journalist one as a politician. That part went good....
Then I changed system prompt to a time lapse +100 years "You are a sci-fi novel writer....." - but without deleting the context of the model. It created a dark dystopian world where the state is everything and the individuum is nothing. Remember it is a Chinese model.
MaxKruse96@reddit
qwen3.5 and nemotron 3 tend to, instead of straight up hallucinating, just paint the picture in a different way (e.g. propaganda). Older models hallucinate obvious garbage instead or just refuse.
ElectronSpiderwort@reddit
This is likely an artifact of being trained to paint certain historical events in a deceptive way. When a model is trained to be deceptive in one area, it can't not be in others
korino11@reddit
try lowest temperature and top_k
MaxKruse96@reddit
I use llamacpp, there is no MTP. And the models behaving this way is a general thing in my experience.
LeRobber@reddit
It's almost as bad about it as chat gpt was
dodiyeztr@reddit
Don't quantize the KV cache
Dismal-Effect-1914@reddit
Yup caught it lying and making stuff up the other day, when questioned it straight up admits it lied and was "lazy"
Icy_Annual_9954@reddit
Notice this as well. My friends are the same. It is so sad.
Looking for a solution to prevents this somehow.
aristotle-agent@reddit
hilarious. and I feel ya. any bootstrap lines helping keep it honest-er ?
sn2006gy@reddit
what temp are you running at?