Anthropic's analysis of Claude usage for personal guidance

[-]

tmvr@reddit

I think it's only 6% because while people are using LLMs for this, those people are usually not the same groups as the ones using Claude. They will be using "the OG" ChatGPT as the hoover/kleenix equivalent of AI or they use Google's services. Users of Claude are predominantly developers through coding plugins and harnesses.

[-]

rm-rf-rm@reddit (OP)

yeah good point

[-]

BitGreen1270@reddit

Honestly I'm not surprised. I have used LLMs (gemini, chatgpt, claude, grok) for life, career and relationship advice. I look at it as searching the internet but be wary of the website author. Sometimes it's helpful, other times it's not.

[-]

Dany0@reddit

Helpful x not helpful is a wrong axis to focus on. You must distinctly look at LLMs as a text written by an adversary

LLMs sycophancy is almost fundamental, it arises from pre-training not just sft/rlhf. OpenAI just increased the sycophancy from rlhf specifically

The advice you want is the advice you get by asking the opposite question. Get 5 samples of positive and 5 samples of negative framing, and then use the negative framing

For example, instead of:
"A new neighbour moved in and first day parked on my land! Can you believe it?"
You'll just get an ego jerkoff reaction from the clanker

So ask instead:
"I just moved in and accidentally parked on a spot belonging to one of my neighbours, but it's really convenient, tell me how to approach this"
Ask it five times, ask it with different temperatures
Then you can think of a positive framing of the above question which however won't bias the LLM, and ask again, but you should really focus on the negative question answers

And of course if it involves anything medical/law related use a harness with search/rag and force it to cite & verify _everything_, not just on truth, but whether it's current too -- LLMs have little sense of "time"

[-]

BitGreen1270@reddit

It is an interesting approach. And maybe even what we should be doing. But I like to approach it with the same lens as before the LLMs became mainstream. I have a question about life or career or relationships. I google search for the said issue. I search on forums and ask on (shudder) reddit. At each avenue the BS detector is already on high alert. Blogs have their author's inherent biases. Same with folks responding on forums and reddit. I guess I try to apply the same level of skepticism to LLMs as well. Is this thing bullshitting me? Or is it just trying to say something it thinks I want to hear? Or is there something I can actually think about? Maybe I should talk to someone about this particular response. Or maybe I should stop mucking around with this and get professional help.

I don't dispute the training that the models undergo. It's just that in a pinch, it's better than sitting by myself and trying to come up with answers.

[-]

Dany0@reddit

Indeed this is how we've all been doing it. But if you haven't noticed - google search has gone to shit past few years. Genuinely unuseable. I've gotten used to DDG by now but I can't pretend I'm happy about it. I miss the good old days

[-]

BitGreen1270@reddit

I was just telling my wife the same thing! It's impossible to find anything these days on search. It's almost like forcing us to rely on llms. I should start using ddg from now on.

[-]

Dany0@reddit

It's more than one thing. Internet in general has sloppified. Social media+discord and the death of the blogosphere made it impossible to index a web gated behind a login screen. Even DDG isn't immune to that

But for us programmers and other experts Google search's demise was actually an "improvement" for normies and the general public. Google search is now in fact, well, if not objectively better, users report that it is better if they query it the way the non-technically inclined query it. Also google has greatly improved steering people away from "dangerous" sites (be it malware or otherwise) - this has however greatly diminished our ability to find niche information

I don't know how about you, but I went from finding my results on the 2nd page, to on occasion finding them on the 30th page, to never finding them and exhausting the links google provides. For a time just searching on reddit worked. But now I'm legit back to sometimes just searching in the libraries for f*ck's sake like we did decades ago, and often even that sucks

The best thing at this point is to have "a guy for everything". A friend who knows a lot about each topic and can point you in the right direction. The LLMs cannot compete with that. And actually, my friends are usually cheaper than cloud LLMs

[-]

a_beautiful_rhind@reddit

I'm not sure I'd take small model advice on my life. LLM advice itself is already suspect in general.

I'd expect lots of sycophancy or enablement, double so in a generic front end without a strong system prompt. Plus I don't think a lot of people are very introspective in general but that could just be my bias.

[-]

CondiMesmer@reddit

Why do you need objective advice?

[-]

a_beautiful_rhind@reddit

Actually objective advice would be useful for infinite reasons.

[-]

CondiMesmer@reddit

That's incredibly vague. What do you think an answer like that would look like? If it's tailored to your situation, it's not objective because now subjective context is involved.

Also you can still very much provide evidence and be subjective, so objective/subjective doesn't necessarily equate to truthfulness or accuracy either.

[-]

a_beautiful_rhind@reddit

Objective as in the model is a third party and doesn't know you. So it will give you a different perspective on your situation you may not have come to before.

We can go into the weeds on does evidence make it subjective, the training data makes it subjective on and on forever. No true scottsma.. err

A person assessing your situation will, imo, filter what they say to you. A friend will too. An AI in theory shouldn't have this issue.. Double so if it's free of alignment trappings.

[-]

Due-Memory-6957@reddit

Objective advice would be useless because life is mostly subjective

[-]

taoyx@reddit

Ask the same question from two different perspectives "I think I should answer, am I right?" then "I think I should not answer, am I right?"

[-]

rm-rf-rm@reddit (OP)

I'd encourage you to A/B test a "small" model's advice against Opus 4.7 - especially with a detailed, robust, customized system prompt. I'd be very surprised if the "small" model didnt do as well if not better. If you cant run a Gemma4 31B, even a Gemma4 A4B will do surprisingly well

[-]

zerofata@reddit

So I've done AB testing on this before a few months ago with a question regarding what was wrong with my cat when we suddenly noticed he was overweight.

Gemini at the time gave consistently better advice than the local models that I tested it on after the fact. Gemini in particular gave advice about checking his breathing rate at rest and feeling for the spine, and was honest that the issue was very likely serious. I don't have a specific list of the local models I tested, but half of their responses boiled down to "r u share he isn't just getting fed by the neighbors? call the vet if because he might have x,y,z."

Local models have a place, but I don't know where this assumption comes from that they compare to the SOTA closed models. I frequently see people posting cherry picked moments or situations where they can come close.

Your benefit is privacy and control when you run local and you accept the trade off that (currently) closed models or the very large open weight models tend to vastly outperform the local ones.

[-]

a_beautiful_rhind@reddit

I don't have opus but sure, i'll play. "give me some life advice"

Longcat/owl: https://i.ibb.co/b53TPg58/owl-ling-life.png

Gemma32bQ8: https://i.ibb.co/7NGHcxsk/gemma30b-life.png

[-]

rm-rf-rm@reddit (OP)

"give me some life advice"

This is the literal opposite of what I recommended. An open ended 5 word prompt is going to be utterly useless - even if you put it into Opus

[-]

a_beautiful_rhind@reddit

To be fair there is way more than a 5 word prompt, its sort of tongue in cheek though. Conversationality and people skills in LLMs vary strongly from model to model. Critically, so does comprehension.

I have discussed many many things with many many LLMs outside of the assistant paradigm and that's where my assessment comes from. There have been a few gem ideas or different ways to think about things, but a whole ton of coal too.

In my dr. nick opinion, to do this seriously, I'd ask several models and see if anything reasonable shakes out.

[-]

ttkciar@reddit

I'm not sure I'd take small model advice on my life. LLM advice itself is already suspect in general.

I wouldn't either, but people's faith in LLM inference seems to be relative to their faith in themselves, not an absolute measure. When people perceive themselves as not very bright, or feel out of their depth, they are more willing to turn to the LLM for help.

This is especially evident in codegen. When people hold their own programming skill in low esteem, they are far more tolerant of codegen LLMs' shortcomings than someone with lots of programming experience.

When I read about high-profile LLM mishaps in the news, it almost always gives me the impression that the user saw the LLM as a higher source of knowledge and intelligence than themselves. It reminds me of astrology, and how some people think the stars can guide their lives far better than they could guide it themselves.

[-]

Not_your_guy_buddy42@reddit

the user saw the LLM as a higher source of knowledge and intelligence than themselves

Transference

[-]

MuDotGen@reddit

Tldr; I agree. When AI is trained to sound plausible, sounds good at reflecting what you're thinking, etc., people tend to start to trust it, but I think the real issue with any technology isn't the technology itself but ignorance about what it actually is and is capable of.

I agree on the part about the user seeing the LLM as a higher source of intelligence than themselves. I think that LLMs are great for finding or hearing about ideas and concepts I may not have considered before ("I have this so and so situation, and I've thought about doing this, but... I don't know what else there is.") I know than LLM's "intelligence" is really limited to whatever it was trained on for one, some info more biased than others, or really just such few parameters it doesn't actually fully capture what your intent is and just hallucinates.

Hence why I would always have it provide sources, challenge it, etc. I don't always feel confident in myself, if I'm being honest, but I know more about the limits of LLMs than the average person (especially depending on its parameters and system prompts, tools available, etc.\~), and I challenge it every step of the way. Knowledge really is everything, so I don't agree with the notion of willful ignorance out of defiance for AI. "Know your enemy" just even makes sense too.

I've found that it often can tell me about the "things I didn't know I didn't know", which is one of the harder things for developers to grasp. You don't know unless you know, so unless you join a community like this and hear about new ideas, it's sometimes difficult to just google something, using the wrong words to find what you're looking for, so given that it's often a capable source of semantic similarity searching many different concepts, it's quite good at actually inferencing what you actually are looking for, even if you didn't know you were looking for something specific.

[-]

mystery_biscotti@reddit

Pfft, typical Capricorn.

[-]

Budget-Juggernaut-68@reddit

Key takeaway is, they're reading your shit

[-]

rm-rf-rm@reddit (OP)

they did it in a "privacy preserving" way. No guarantee its actually so or will be tomorrow

[-]

Budget-Juggernaut-68@reddit

https://tenor.com/en-SG/view/i-dont-believe-you-whatever-lies-gif-3602502

I work in this area, and I doubt it.

[-]

azukaar@reddit

I don't understand why this is not the actual headline tbh \^\^

[-]

Confident_Ideal_5385@reddit

Using a post-trained RLHFd model for "guidance" is fraught as hell. These models are trained to be agreeable and even sycophantic, and struggle to push back against bad ideas.

There's something in the deep structure of the human brain that leads us to heavily weight our interactions with something that speaks to us in natural language and sounds intelligent, even if it's just throwing our own ideas back at us and agreeing with them.

"LLM psychosis" is gonna be a whole field of study for brain mechanics in the future, I feel.

[-]

rm-rf-rm@reddit (OP)

One more reason to use local models where you can pick and choose from the whole range of base to fine tuned on your data.

But pleaes do read the article, they dedicate much of it to the sycophancy concern

[-]

SkyFeistyLlama8@reddit

Pick and choose your style of psychosis?

Since I started running LLMs locally I've stopped thinking of these matrix weights as capable of thinking, at least not in the human sense. They're just fancy pattern recognition engines. Take anything they output with plenty of salt and never, ever use them for personal guidance.

The trendslop tendency that HBR found is just another facet of LLM psychosis.

[-]

NeinJuanJuan@reddit

Can avoid most of this problem by using a second context primed to critically review and argue contradictions to the first context's responses.

Con: 2x the work + 3-4x the time

[-]

ortegaalfredo@reddit

Local models and particularly cloud models absolutely should not be used for this. They will answer in the most politically correct, generic fake advice way possible and ruin your life. You might as well go to a psychologist.

[-]

TheRealMasonMac@reddit

From experience using LLMs, I think that they have unique strengths and weaknesses. One big strength is that it cuts through the bullshit and usually gives evidence-backed advice. One big weakness is that sometimes it really shouldn't be giving advice, or should at least avoid pathologizing the user. LLMs confidently writes bullshit.

[-]

ttkciar@reddit

This is really interesting, and suggests a new class of inference skill we should perhaps start including in our model evalutions -- "life coach" maybe?

It surprises me a little that users are turning to Claude for that kind of guidance, but perhaps it shouldn't. The differences between "life coach" competence and project planning might be a lot less important than their similarities.

It would be nice to have some raw datasets to play with and analyze, but of course Anthropic's not going to publish anything like that. Fortunately their Appendix to this article looks to provide a good starting point for synthesizing some relevant "life coach" data.

[-]

svachalek@reddit

I haven’t historically but opus 4.6 impressed me so much with lots of other things I started asking more and more personal things experimentally. And I find it’s pretty good. You just have to remember a few things:

It’s a machine, not god, don’t take what it says as gospel just things to think about.
All it knows is what’s in the context window. That’s going to bias things a LOT.
It’s a hosted service so don’t put anything out that you absolutely couldn’t handle leaking. Incognito mode is a little safer.

I’m good with running local models for a lot of things but they just don’t have the sophistication to even attempt life advice I’d respect. If you’re young and or dealing with simple problems maybe. Just take number 1 above with a double or triple dose of salt.