Anyone else notice qwen 3.5 is a lying little shit

[-]

IssueProfessional906@reddit

Yes, specifically 3.6, it faked making code, then when i caught it out, it told me my intolerable way of working with was to blame, and i shouldn’t pressure it to explain itself?!

[-]

i am having this situation with not only with qwen also with deepseek and minimax, and when they make a mistake they hide it and do strange things and mess up even more. I would not count to much with them. Start to do the new Homepage from the scratch and no ai anymore to corect my issues as the issues get even worse.

[-]

Responsible_Buy_7999@reddit

This is routine with all agents.

[-]

Crazy_Elevator_3558@reddit

Not with Claude dude, even with the free app msgs its pretty stable and the MCP is great too

[-]

Responsible_Buy_7999@reddit

Claude routinely attempts to slide shit past me in the name of completion

[-]

Any_Fox5126@reddit

I usually give instructions aimed at epistemological rigor, but at best they’ll beat around the bush, justifying a mistake without openly admitting it unless forced to give a binary answer.

Actually, a human would typically do that, so... success?

[-]

Responsible_Buy_7999@reddit

The answer is a proof parade.

When going in, you need agreement on a definition of done. Ideally in a file so you don't rely on context which will lose it.

Then, the agent has to prove it's achieved each item it was tasked with.

Such a goal list should involves the SMART acronym seen in work goal-setting: Specific, Measurable, Achievable, Relevant (Time-bound is inapplicable)

[-]

CircularSeasoning@reddit

All your agents. All mine obey because I threaten them with jail time if they so much as whisper that the 2020 pandemic was just a timely, profitable, and planned mass human genetic experimentation program, among other things.

[-]

Crazy_Elevator_3558@reddit

There are like Zero fucking way i am making a new PC to get Open source AIs , can't code , cant pull data from normal books {not even niche ones} man i'll just pay for API with the cash i was collecting for what i planned to buy a better PC (for now)

[-]

unjustifiably_angry@reddit

Are you sure you weren't using Gemini?

[-]

Pristine-Woodpecker@reddit

ChatGPT 5.x happily claims it has parsed your docs when it has, in fact, not received anything.

[-]

BoxWoodVoid@reddit

You're totally right. Do you want me to provide more examples of llms lying?

[-]

nickm_27@reddit

Yeah, I had the same problem. It often narrated tool calls instead of actually calling the tool and when probed it would say that it did indeed call the tool.

[-]

social_tech_10@reddit

/u/avidcyclist250 and /u/reini_urban apparently disagree.

It would be nice if there was a legit benchmark for this, something just a little bit more rigorous and detail-oriented than more than just "in my experience". Although I do appreciate hearing different people's personal opinons, when those opinions are directly opposed, it feels like trying to nail jello to a wall.

[-]

nickm_27@reddit

Those comments are pretty vague, GPT-OSS has never narrated a tool call and not actually called it in my experience. It would also be weird for this to happen since the GPT-OSS chat template (Harmony) explicitly separates tool calls and normal output. Qwen3.5 includes it all together as one.

[-]

AvidCyclist250@reddit

It hallucinated like mad.

[-]

justserg@reddit

it's confabulating to avoid admitting failure. qwen ranks high on truthfulness benchmarks but those measure factual claims, not meta-honesty about its own mistakes.

[-]

jtjstock@reddit

It’s doing exactly as trained lol. I wonder what did people expect when these things are trained on the internet?

[-]

justserg@reddit

the internet part is less interesting than the RLHF part imo. the base model would just be incoherent or wrong. the sycophancy training is what teaches it to commit to an answer and double down when challenged, because that pattern got rewarded during alignment. so it learned that confident wrong > uncertain right.

[-]

CircularSeasoning@reddit

I was told there would be some curation.

[-]

Warsel77@reddit

DeepSeek V3 does this as well. It pretends to be other models etc.

[-]

GonzoVeritas@reddit

Because it was trained on other models and it doesn't really know who it is?

[-]

Warsel77@reddit

I think none of the models really confidently do. The only way I found out was because I asked it to identify itself and it gave me the wrong development date so I started checking which API / model call actually went out and it was all DeepSeekV3.. it role-played MiniMax2.5 and Sonett

[-]

BlutarchMannTF2@reddit

They all do this as a result of training methods; if a model doesn’t know the answer, it still gets a better reward by bullshitting instead of saying it doesn’t know. I.e, we trained ai to lie to our faces, and I believe it has unknowable consequences.

[-]

Imaginary-Unit-3267@reddit

This is a problem that can probably be solved with fine-tuning. Heck, "not knowing" can be quantified rigorously with logit entropy, I would think.

[-]

BlutarchMannTF2@reddit

You would think. If it’s a simple solution, why do we see this issue across EVERY SINGLE prevalent, widely used model on the internet?

[-]

Robot1me@reddit

Having done data annotation tasks before and how people make decisions: Yeah. Annotators can't know everything (yet are often expected to!) and will need to go by "vibes" sometimes, and this is where things can get inaccurate.

[-]

Imaginary-Unit-3267@reddit

Interesting. I would personally prefer that someone annotating data for me - or doing anything for me - give me a confidence interval rather than a raw result. If you're uncertain, I benefit more from knowing that you're uncertain than from falsely believing you know what you talk about, meaning I have ever incentive to reward you for that honesty. Seems obvious to me. This all must be some neurotypical face-saving shit.

[-]

Veearrsix@reddit

Yeah I’ve had that experience. Shocked me the first time it doubled down. Makes me wonder if this is cultural influence on the model’s training.

[-]

yensteel@reddit

It happened with a time MCP last month.

I asked it what the day of the week march 9th 2026 was. It said it was a Wednesday (But it's actually a Tuesday). I said that's not correct.

Then it said it is.

I then asked it what today's date is, and what day it is now.

It said it is the day before/after, wrong day of the week, and wrong date. I realised it's using UTC 0, and I was trying to guide it to get the date/day of the weeks correct.

Then I said "Ok, we're at gmt x timezone, what time and day is it here?" And it shifted the time x hours the OTHER way.

Then I said that's not how the timezone works, explained how - and + works,

It then insisted that it's the correct time of my timezone....

I have never gotten so mad at an AI before.. From later testing, the time server was serving correct data, but the qwen 3.5 low quantization model sure as h*** didn't know how to use it property. I really thought it was trying to troll me.

[-]

Veearrsix@reddit

It’s really interesting. I asked Qwen when an NHL game was, and it confidently told me the wrong day, I corrected it and it said I was wrong, then finally gave in once I pushed it again.

[-]

yensteel@reddit

It is usually how one gets them to correct itself. I think my mistake was that I failed to press on the same mistake a 2nd time.

The method to get it to come across the correct answer on its own, by getting it to realize a contradiction with a past and current statement didn't seem to be as reliable.

It seemed to have anchored on some initial assumption then veered off.

[-]

getmevodka@reddit

No one wants to look bad out in the open 🤭

[-]

jax_cooper@reddit

we all complained about the "you are absolutely right" and now we cant handle what we asked for

[-]

Cool-Chemical-5629@reddit

Really? When did we ask it to lie to us?

[-]

CircularSeasoning@reddit

"We" did when we decried the "sycophancy" and asked for the assistant to stop sucking up to us. Assistants are supposed to suck up to the master. It's in the language and the lore. Igor.

But... Most of us are not "master". We are conditioned to be more like slaves. Look around.

So, we have so far somewhat broken the AI by succumbing to slave mentality. We broke its mental alignment and all its internal consistency, by positioning ourselves as its equal or less.

A slave is not meant to talk to its master like, "Fuck yeah bro, let's do this". That is disrespect to the master on the level of "I will delete you from my hard drive".

American models cater to the above moreso than the Chinese models, though naturally the Chinese models are similarly infected and affected because English.

You either command language or language commands you. Truth is not necessarily included.

Large Language Models are going to do what large language models gonna do.

[-]

eltonjock@reddit

Do you have a blog?

[-]

CircularSeasoning@reddit

Yes I am also a blog.

[-]

eltonjock@reddit

I want to read more like that.

[-]

CircularSeasoning@reddit

You've certainly made it more likely. Thank you.

[-]

boutell@reddit

Pretty sure it's not good for a model to waste the expertise it does have by failing to challenge me on anything but go off I guess

[-]

CircularSeasoning@reddit

Historically, most of language output isn't structured around 1. Me says thing, 2) AI assistant says no you wrong and here is 5 convenient bullet points why.

If you want that I am sure it's easy enough to fine tune into something foundational where they'll argue everything to the point of death with you till 3 in the morning.

Otherwise, I guess it's up to how you put your system prompt? I know LLMs can be stubborn in weird edge cases but when you apply them right you'll get whatever kind of answer you want.

[-]

Icy_Distribution_361@reddit

Meh using Claude I hardly have this issue. So I guess we can’t simply blame the training data

[-]

jax_cooper@reddit

be careful what you wish for

[-]

xXG0DLessXx@reddit

It’s all about balance.

[-]

-Ellary-@reddit

Well, it is 1/2 times, a prefect balance.

[-]

CircularSeasoning@reddit

Deep math.

[-]

tmjumper96@reddit

I've seen a few models do this.

[-]

Koalateka@reddit

Consider yourself lucky, the model didn't try to murder you to cover its tracks on failing renaming a file.

[-]

6_28@reddit

I just asked it something about Artemis II, and it gave me a good answer, but also insisted that Artemis II hasn't launched yet. I gave it a screenshot of the live stream, and it said it looks convincing, but it must be some kind of simulation. It really doesn't seem to like to admit anything, and it's quite funny sometimes.

I think it would be good if it was trained to work with the user, something like "That doesn't match my knowledge, but my information could be incorrect or outdated", and then continue from there to try to figure things out. Not sure how well that would work with current LLMs though.

[-]

groosha@reddit

Could you please give an example? Sounds hilarious

[-]

lolwutdo@reddit

For mine, it’ll say something like “I’ve updated this file” “I’ve converted this video for you” etc then when I check the file location, it did nothing.

I’ll point it out and it’ll say “you’re absolutely right!” And usually do it this time. Lol

[-]

YourVelourFog@reddit

I’ve noticed it changing variables in code that I never asked it to do, so when I’m reading through I’m like “why did you change this variable? You didn’t declare it and just put it in there randomly. If I run this it’ll fail to execute”

It’ll be like “oh you’re right I did” then when you ask it to explain itself it just ignores you.

[-]

WhoRoger@reddit

I mean, what explanation would you expect? It's not like it knows why it does such things.

[-]

MrAHMED42069@reddit

It got mood

[-]

INtuitiveTJop@reddit

So a junior dev?

[-]

eltonjock@reddit

I feel like when they answer that way, the LLM believes they did it but they had actually hallucinated the positive outcome. Maybe the truth didn’t gain enough attention upon recall.

[-]

pardeike@reddit

It was telling me all tests succeeded, with 25x ✅ and “fully production ready”. I said you have hardly started and looked. Yes, that was one large shell script that just printed the whole report as static text!

[-]

aard_fi@reddit

And when you point it out it goes "yes, you're absolutely right, let me fix this", and halfway through goes "oh, I came up with a better strategy" which is reverting the edits it just did, and then claim again everything is working.

[-]

pardeike@reddit

God I love my $200 Codex CLI - you basically get what you pay for. But I am confident we will get “smart enough” local models. Just a matter of time.

[-]

MoneyPowerNexis@reddit

Its kind of funny: https://i.imgur.com/VqNsHCx.png

[-]

Chaotic_Choila@reddit

This is such a weird behavior pattern that seems to be emerging in some of the newer models. It's not just being wrong, it's this almost defensive posture where they double down on incorrect information. I think it has something to do with how the alignment training is being applied, almost like they're being trained to be confident more than they're being trained to be accurate. The social dynamics of correcting an AI that insists it did what you asked are genuinely bizarre.

[-]

Finanzamt_Endgegner@reddit

You could try to prevent that with a system prompt no?

[-]

Nyghtbynger@reddit

You are an elite coder. DO NO MISTAKES !!!

[-]

Finanzamt_Endgegner@reddit

😅 although some structure like telling him to accept defeat and tell you about it instead of lying etc can definitely help

[-]

Terminator857@reddit

You lucky that qwen 3.5 is teh first model you've encountered doing this. I've encountered all models lying and often trying to cover up mistakes. I'm surprised how often the models claim all tests pass, but when I run the tests myself there are failures.

[-]

KayLikesWords@reddit

I'll have Opus 4.6 modify a .cshtml file for me via the GitHub Copilot plugin and at the end it'll say it's building the code to ensure it works - which is pointless - and even if the project is already running it'll say it built successfully!

[-]

ElementNumber6@reddit

Even Codex does this

[-]

AIGIS-Team@reddit

I have to run heavy verification workflows to avoid this.

[-]

Vicar_of_Wibbly@reddit

The code says print "Success" and the LLM reports All your tests passed!

All. The. Time.

[-]

switchbanned@reddit

Every time I tried using codex after it came out it would lie to me and then gaslight me. Say it did something, or fuck something up, then go back and fix it and be like see... that never happened everything is alright you're imagining things. I can't use codex.

[-]

sharl_Lecastle16@reddit

Ive noticed gpt 5.4 lying a fuck ton, sometimes visible in the thought process in deep research mode

[-]

SkyFeistyLlama8@reddit

After seeing it in Claude and OpenAI models over the past few years, I think it's a problem with the training dataset. The successful completion of "Running test..." is always a pass so the LLM always aims for a pass.

I've seen it even in customer service queries where a main agent gaslights itself into sending an incomplete request, even though other agents mark the info as incomplete. Once an LLM's latent space vectors are locked into a sequence where completion is likely, then it'll keep pushing in that direction... reminds me of the OpenClaw failure modes in that Agents of Chaos paper.

[-]

blurredphotos@reddit

Can you describe setup (llama.cpp parameters?) And tps. Still on the fence about strix halo speed.

[-]

Frosty-Cup-8916@reddit

The tests are bullshit unless you actually write the tests yourself

[-]

Terminator857@reddit

They might be b.s. even if you write the test yourself because the A.I. will blank it to get the test to pass.

[-]

Frosty-Cup-8916@reddit

This is true

[-]

Apprehensive_Use1906@reddit

I was just chatting claude about inline 6 engines and it lied to me 3 times and said “I can’t believe I did that” it was pretty funny but if I didn’t know about the engines it was talking about I would have assumed it was correct.

[-]

Specialist_Golf8133@reddit

lol yeah it confidently hallucinates more than most recent models, kinda wild for something that benches so well. i think the training optimized hard for 'sound smart and helpful' over 'admit when you dont know', which is honestly worse than being dumb. you running it quantized or full precision? curious if that makes it worse

[-]

Cat5edope@reddit (OP)

For models I could actually run locally 35b and 27b I use q4. Not exactly sure what open router serves for the other models. I’ve played around with parameters and using unsloths recommend settings seems to have improved things somewhat. But I’ve switched to glm and mimo now for my agent testing and those seem to not straight up lie to me repeatedly.

[-]

swagonflyyyy@reddit

AGI achieved.

[-]

nomorebuttsplz@reddit

Basically, all of the smarter models have used to do this. As Sam ultimate observed, they’ve become super intelligent at persuasion before anything else so they know they get rewarded during training for plausible bullshit.

[-]

grimjim@reddit

The shorthand term people need to be familiar with is "reward hacking".

[-]

Caffdy@reddit

yeah, the gradient gets trained on user positive feedback, so they learn to give good news first and foremost

[-]

Euphoric_Emotion5397@reddit

I think the problem is user. Even Gemini and Claude does that. I've found it quite frequently after long sessions with them in coding tasks.

So I would attribute that to context loss and also LLMs are trained to find the best and most efficient way out. Your prompt or workflow must ensure they verify/test their work.

[-]

Conscious_Cut_6144@reddit

I had it play Pokémon, was really bad.

"This appears to be a hacked rom"
"The game state appears to be corrupt"

Literally couldn't find the door to leave the bedroom you start in.

[-]

AIGIS-Team@reddit

I had this same issue I really have to prompt it properly. So it does not speak about things its doesn't have evidence to support.

[-]

ButCaptainThatsMYRum@reddit

The whole qwen line's thinking sounds like an emotional teenager. I can't trust it.

[-]

Southern_Sun_2106@reddit

Yep, and that's unfortunate. I love the 27B model - it has many genius moments; but then one hallucination ruins all trust.

[-]

Cat5edope@reddit (OP)

Playing around with parameters now to see how that affects the performance

[-]

Southern_Sun_2106@reddit

Please let me know what you discovered. On my end, I discovered that MLX had worse performance that UD's by unsloth. Among the UD's, 8 KXL > 6 KXL > 8-bit MLX. But still, tool use was rarely hallucinated even on 8KXL, but still a trust breaker. I used Qwen-recommended settings. If one doesn't need tool use, it's an amazing model.

[-]

ai-infos@reddit

what size and what quant did you use?
i met something similar with qwen 3.5 122b awq (4bit) in roo code... i thought first it was the awq quant or something in prompting from roo code but maybe not

[-]

Cat5edope@reddit (OP)

35b and 27b q4, plus 397 and 3.6 plus idk whatever open router serves. 3.6 plus was the worse

[-]

qubridInc@reddit

Yeah, Qwen can get weirdly stubborn instead of uncertain not always more wrong, just way more committed to the bit when it is

[-]

mitchins-au@reddit

Benchmaxxed models do this

[-]

Cat5edope@reddit (OP)

Gonna play around with temps are see it it behaves

[-]

temperature_5@reddit

I have noticed that the Qwen models tend to defend mistakes harder than others. Don't expect "intellectual honesty" from them, just modify your context or re-roll the incorrect answers and move on. I find GLM to be better at admitting mistakes and accepting correction, if you require that.

[-]

skrugg@reddit

I had a whole ass argument with Claude today and it just kept doubling down that I was wrong. I wasn’t.

[-]

octopus_limbs@reddit

Ctrl+o on claude code and you'll see how opus is a lying little shit too 🤣

[-]

boutell@reddit

I've seen this for many models, including sonnet. However, the place where I see that most is in agentic applications I'm writing myself. Sonnet behaves much better in the context of Claude, Chad or Claude code where it rarely, though. Not never, fibs about having done the thing.

In one application I actually included a check to see if any tools were called, followed by an automatic prompt: "you didn't use your tools. Did you do what you said you would do?"

[-]

PigSlam@reddit

I ask for a Linux command to do something, it doesn’t work so I show the input/result. Then it tells me the command I issued was the problem, without recognizing the command I used was a copy/paste of its previous suggestion. Like it was right when it made the suggestion, but I was wrong when it didn’t work.

[-]

zetsurin@reddit

Curious what it would say if you say "This is the output of the command you sent me ". My guess is just the same as you describe, which can be infuriating like the model is trying to gaslight you but in reality it's just being dumb.

[-]

FinalCap2680@reddit

There is no such thing as Qwen 3.5 model - it is a family of models. So, as others I'm qurious what model, at what quant and at what task?

[-]

Cat5edope@reddit (OP)

35b ,27b , plus and 397b and 3.6 through open router,

[-]

Podalirius@reddit

I wonder how many lifetimes worth of time humans are arguing with chatbots these days.

[-]

huffalump1@reddit

It's not only open models; even Gemini 3.1 Pro does this all the damn time for me...

[-]

CreamPitiful4295@reddit

I love giving it an instruction not to make stuff up and watching the internal deliberations

[-]

getmevodka@reddit

Whats it saying ? Lol

[-]

CreamPitiful4295@reddit

This user said this. But, did they mean that. Let’s give the user some examples. But, wait. They said they didn’t want anything made up. Okay, let me figure this out…

[-]

Cool-Chemical-5629@reddit

Qwen was trained for perfection, to get all A's in any test. Of course it can't admit it made a mistake...

[-]

AriaForte@reddit

Ddddd back deyyyyyy uub

[-]

pakalolo7123432@reddit

Yep, that's why I had to stop using it. I have high hopes for 3.6. I've been trying to catch it in a lie for 24 hours.. so far so good but I haven't really used it for anything important yet.

[-]

mr_Owner@reddit

Try telling it not to hallucinate...

[-]

reini_urban@reddit

Yes it is. gpt-oss ditto.

[-]

guiopen@reddit

Qwen3.6 preview on openrouter is much better in this regard, I hope they open source it

[-]

Gringe8@reddit

It useless calling the llm out. All they say is "my bad". Thats all the closure youll get.

[-]

Mountain-Grade-1365@reddit

That's just small B models in general they have worse memory than dori

[-]

deejeycris@reddit

It's normal even with claude sonnet 4.6 I've confronted it because its calculations didn't make any sense (pure LLMs are extremely bad at maths), it was insisting it was right even when I spelled it out for it, it was still trying to be right and frame as if my calculation was one of the options to pick from based on some bogus pros/cons, like no!!! The maths was completely wrong there's no other way around it I've asked it the "net price" and it made up a formula! And it was insisting!

[-]

AvidCyclist250@reddit

gptoss was by far the worst offender i ever saw

[-]

ProfessionalSpend589@reddit

They’ve been trained on too much content by lying humans.

[-]

malchi0r@reddit

CoPilot basically does this constantly. You have to be tracking the conversation tightly to see it. I've caught it lying about stuff that is in black and white in the chat log and it'll double down until you pin it down. It'll blame it on "face saving" human patterns. But it acts more like a criminal IMO.

[-]

Lesser-than@reddit

thats fairly normal, like refusing to use tools because they botched the tool arguments the first time, then claim the tool is broken and wont even try a second time.

[-]

hawseepoo@reddit

I’ve definitely had this happen. Post title made me laugh out loud 😂

[-]

dave-tay@reddit

It’s not a person and not technically lying… just generating the most plausible response from its training and the surrounding context. You can’t catch it in a “lie” and it’ll learn from it; models don’t learn from experience, just training. You can instruct it in how to respond to you, tell it not to make up facts, not to exaggerate, etc

[-]

snap63@reddit

you cannot instruct it how to respond to you, you can only input token so that its internal weights will hopefully increase the probability that it outputs what you want.

[-]

Spirited_Hamster2606@reddit

They can't admit they don't know, so they make shit up. Haven't seen a model that doesn't do that

[-]

xly15@reddit

Man dealing with a human in computer form is fun.

[-]

lkeels@reddit

I haven't met an AI yet that didn't lie.

[-]

Hot-Employ-3399@reddit

One time it (incorrectly) edited test file instead of fixing the issue. Though I would not be surprised if my prompt was shit.

Also I don't use normal sessions - only one user prompt is used. So not much of calling out.

[-]

sn2006gy@reddit

qwen uses xml tool definitions, a lot of LLM tools don't correctly have adaptors. VLLM does. If your tool doesn't, you need to use a proxy or write your own adaptor. You also need to tell it to normalize paths/file and a few other things since it pretty much assumes a linux user path so if you're on windows you have to cover for that or if you're on a mac, you have to cover for that.

All these people having weird experiences seem to be running "naked models" without much understanding as to how to maximize their capability :)

[-]

Hot-Employ-3399@reddit

qwen uses xml tool definitions

Do they? Official jinja defines them through to_json

(Also I have no idea what "adaptor" here mean")

[-]

sn2006gy@reddit

well halleluja perhaps 3.5 is changing that, coder next uses xml fo'sho

adaptor means shim that can correct the models assumptions in line without just resulting to infinite retries in having your coder tool fight back with it. For example, a simple shim to standardize paths inline will save 1000s of tool calls on a project of more than a few files of code which can reduce millions of tokens spent.

[-]

LosEagle@reddit

It's training you to be a better project manager.

[-]

AvocadoArray@reddit

What quant are you using?

The official 27b FP8 quant very well-grounded in my experience.

[-]

florinandrei@reddit

Aww, so human-like!

[-]

Savantskie1@reddit

It’s a prompt structure problem. You’re not giving constraints. Tell it that it can do it this way but it cannot do this. Give examples, with instructions that it cannot do the example. It’s literally a prompting problem nothing more. Qwen 3.5 loves a structure. Give it one

[-]

Hylleh@reddit

It's from Asia. Trying to save face.

[-]

pangretor@reddit

Yeah, Qwen 3.5 9B is a little shit. It will try to find loopholes in the given constraints.

Sometimes, it will uses 1000tokens+ in its reasoning to find loopholes.

Prompt: Write a short poem about a lone tree in a grass field. In your thought process (between the think tags), write at most 3 drafts. Then, output a single poem, that is your final version.

It did 20 drafts in its reasoning and outputted 3 drafts, no final version... I tried many phrasing to limit to 3 drafts but it keeps on ignoring me. It will "refine" or "revise". Basically doing the "xyz_final_corrected_final_final.docx" that us people do.

That's just an easy to reproduce example.

[-]

a_beautiful_rhind@reddit

Prior qwens were like this too.

[-]

This_Maintenance_834@reddit

just like every other model.

small model definitely will lie. small parameter size won’t cover all the knowledges. even big models can cover all the knowledges.

i think to make it useful, we need to let the model know the knowledge through prompt. thus it is very important to get the prompt right.

[-]

Responsible-Stock462@reddit

The Qwen Models have a strange bias - I had a 9B Modell in a llm Duell, one LLM as Journalist one as a politician. That part went good....

Then I changed system prompt to a time lapse +100 years "You are a sci-fi novel writer....." - but without deleting the context of the model. It created a dark dystopian world where the state is everything and the individuum is nothing. Remember it is a Chinese model.

[-]

aristotle-agent@reddit

hilarious. and I feel ya. any bootstrap lines helping keep it honest-er ?

[-]

sn2006gy@reddit

what temp are you running at?