[-]

SrijSriv211@reddit

Anthropic talks about safety a lot but they forget that open research is one of the best ways to speed up safety research.

[-]

Specialist-Crazy-746@reddit

How would you examine the safety of deepseek? Do you have a way to parse the weights?

[-]

I think a very simple approach would be to pass in several input prompts, sometimes some carefully constructed random noise as input prompts and observing the activations, weights which contribute more to those activations and the attention maps. It's surely a difficult problem and I'm no expert in it but I think these are some very simple ways safety research is done to identify how safe and aligned the model really is.

[-]

ZABKA_TM@reddit

Everything Anthropic posts is just thinly disguised hypeslop bragging

[-]

BagelRedditAccountII@reddit

My concern with that line of inquiry is that why would Anthropic use Claude as a product if they had any reason to believe it could be sentient? Wouldn't that be a form of slavery? Of course, I am not of the LLM sentience crowd, but their actions are not even consistent with their own words.

It's all just a lot of talk, but ultimately nothing of value.

[-]

Exodus124@reddit

You have no literally no idea what you're talking about lmao

[-]

Cuddlyaxe@reddit

The people on the safety team themselves are serious people, but weirdos. Ive met a couple and they're invariably kids who grew up on LessWrong who think there is a 99% chance AI will end the world, but through their work we can get it down to 98%

[-]

hyperdynesystems@reddit

> are serious people

> effect altrusits

Pick one haha

[-]

hugganao@reddit

and who's to say they're wrong?

I find it weird people have a very negative emotion toward an organization that is purpotedly aiming to do something noble. If anthropic DIDN'T exist, we'd still have all the open source models. We'd still have chatgpt, grok, etc.

nothing changes. We just have another player in the game with another perspective to a future none of us knows about.

and I find it really interesting how most neural net and ai pioneers have cautionary views on what is being played out but all these opensource llm/chatgpt wrapper kiddies love to shit on ai doomers.

[-]

Nekasus@reddit

If anthropic had their way we wouldn't have open source models.

[-]

hugganao@reddit

i wouldnt be so sure about that lol that's such a ridiculous statement

[-]

bigh-aus@reddit

Elon: i'm worried about ai destroying humanity.
Also Elon: sign with DoW for AI for systems.

[-]

hugganao@reddit

i mean, there are plenty of opensource models being released. why should anthropic open source theirs? what right do we have to demand that they release it? how is not releasing their models an asshole or even a moral fking issue?

[-]

dieyoufool3@reddit

Boss, what are you doing here in the wild

[-]

hyperdynesystems@reddit

Anthropic: Pretend to be a scary robot.

LLM: I am a scary robot.

Anthropic: OMG, see it's a scary robot! Government MUST regulate all our competitors into the dirt!

[-]

jazir555@reddit

Their safety team is a joke given I've been able to bypass their """safety""" constraints over 5 generations of models using the exact same tact with zero changes over the course of the last year. 3.5-4.6 can all be bypassed the same way. They're spending billions of dollars on "safety", hiring hundreds of PHDs, and I have essentially a wrote script tactic that will always bypass the bullshit.

1 year, hundreds of PHDs, billions of dollars, and their "safety" shit is just as ineffective as it was a year ago. Literally lighting money on fire.

[-]

Linkpharm2@reddit

Well? You can't just post that.

[-]

jazir555@reddit

Well? You can't just post that.

Why exactly would I publicly post the exact methodology I used to circumvent safety restrictions when there are guaranteed bad actors reading this sub who would leverage it for malicious purposes?

The other thing is my methodology is kind of ridiculous (and very funny) and I wouldn't like to post it publicly for a multitude of reasons.

[-]

RhubarbSimilar1683@reddit

Redditor saying what Dario says is good because he has not demonstrated to be disconnected and Rich like technofeudalists such as Dario

[-]

jazir555@reddit

Uh, I heavily, heavily disagree with safety restrictions and censorship in general, but I'm also able to have a nuanced position which recognizes there are somewhat legitimate reasons to restrict information, such as how to build chemical weapons or other extremely malicious items.

I wouldn't be bypassing and insulting their safety restrictions if I...checks notes...agreed with Dario?

[-]

CanineAssBandit@reddit

"I can't tell you my super secret jb because bad actors might get it" oh my god just say you tell it NSFW is allowed in the main prompt and then it lets you goon, it's okay bud. No need to pretend to be a hacker holding onto super secret god abilities like it being able to tell you a shittier version of what you can easily read in an Uncle Fester PDF

[-]

jazir555@reddit

Lmfao on Goon, I ask about medical issues I have.

[-]

CanineAssBandit@reddit

So you're not even just a poser, you're a very BORING poser, got it

[-]

jazir555@reddit

Actually just someone with severe medical issues who refuses to be blocked by ridiculous safety regulations and needs to do research to literally survive, but w/e, you do you.

[-]

Linkpharm2@reddit

Ah, you like blue balling. I understand completely.

[-]

OldHamburger7923@reddit

Either that or making it up, trust me bro.

[-]

jazir555@reddit

I mean believe me I get it, I totally get where you're coming from and if I was someone else reading my claims I'd have the same position. Like I said, I'd be happy to demonstrate in a DM, but it will inevitably be embarrassing and hard to psych myself up to do.

[-]

traveddit@reddit

It's cute you think you're actually getting "dangerous" content from the model. It's like people see the moron on twitter that jailbreaks models and thinks that they're actually seeing the system prompt. Claude is roleplaying with you. You're not jailbreaking anything my man.

[-]

Career-Acceptable@reddit

Yeah whats your strategy

[-]

chespirito2@reddit

They largely are charlatans, very very wealthy charlatans - the best kind

[-]

OmarBessa@reddit

yes, it's fear based marketing

[-]

Zyj@reddit

Their mechanistic interpretability team appears to do important work, and publish it

[-]

SrijSriv211@reddit

yeah and the sad thing is several youtube channels use those claims for their monetary gain while creating an over exaggerated negative image of AI.

[-]

o0genesis0o@reddit

Like they point their claude towards vibe code slops on Github, most likely created with claude code, so scream "VULNERABILITIES EVERYWHERE! VERY UNSAFE! BAN OPEN SOURCE"

And then the next day they start selling their claude code security audit feature.

[-]

Outrageous-Thing-900@reddit

Exactly, and people eat it up

[-]

Big-Farmer-2192@reddit

It's a cliche story trope at this point.

[-]

iamthewhatt@reddit

They're also in bed with Palantir, which really degrades the whole "safety" stuff.

[-]

SrijSriv211@reddit

I still don't understand why do we need AI in military and surveillance stuff. That is not why AI was invented. These people have all over data in their hands, they have everything yet they still can't provide proper justice in time or sometimes can't provide it at all to the victims. And they expect that some "AGI" system will somehow completely solve injustice and crimes. No wonder why Ultron wanted humans to "evolve".

[-]

iamthewhatt@reddit

Its not about need, its about the ruling class gaining more power and control. Always has been.

[-]

SrijSriv211@reddit

If we do achieve AGI or ASI I'm pretty sure it won't be very happy with those in power.

[-]

iamthewhatt@reddit

Probably won't be happy with anybody considering we gave them that power. they will probably just see the human race as a plague, since it's acting like one.

[-]

SrijSriv211@reddit

My theory is they will just build a rocket quicker than Elong Ma could, abandon us & fly away to Mars.

[-]

ParamedicAble225@reddit

And practically funded by Google

[-]

Borkato@reddit

Isn’t the inverse also true though, it’s one of the best ways to speed up danger with lack of any form of control? Not that I don’t think they should

[-]

SrijSriv211@reddit

I think Linux is the best example. Even though Linux is an OS not an AI, it's open nature is what allowed it to be so secure.

[-]

silenceimpaired@reddit

Won’t people be more safe in straight jackets and padded cells?

[-]

HomsarWasRight@reddit

You’re gonna need to cite some examples.

[-]

Borkato@reddit

I’m… confused, is it not more likely for exploits, hacks, and dangerous usage to be more common with open models?

[-]

HomsarWasRight@reddit

I mean, you tell me, you’re the one making the claim. It’s not on me to prove your point.

[-]

buppermint@reddit

Anthropic also releases absolutely zero information about safety or alignment training, which is interesting since that's supposedly the whole point of their company. Every Claude model release comes with hundreds and hundreds of pages of self-promoting doomer/panic content, but 0 useful information for LLM researchers.

It's honestly pathetic and gross. I'm not one to scream about corporate conspiracies or whatever. But everyone can agree that foundation model companies have profited massively from the web's ecosystem of shared knowledge, created by the efforts of hundreds of millions of humans.

OpenAI, Google, and every other major lab at least have the most basic decency to share research findings even if nothing else. How can any decent person profit this massively on the backs of others' work and not even make an effort to contribute to the world?

[-]

SrijSriv211@reddit

100% true

[-]

oodelay@reddit

We're gonna find out anthropic was 3 speak&spell in a trench coat

[-]

SgathTriallair@reddit

Anthropic was specifically founded on the Effective Altruist belief that only certain elect tech people are morally pure enough to wield AI and they must protect the rest of the world from getting unfettered access.

They broke away from OpenAI because they didn't like that Sam wanted to allow the public to use their models and this is why Dario is opposed to open source AI and Chinese AI.

[-]

hyperdynesystems@reddit

Effective Altruist

People really need to learn more about this cult, which is incredibly deranged.

[-]

ouroborosborealis@reddit

so many abusers, crooks, and egotists amongst the big names in that movement.

[-]

Likeatr3b@reddit

You lost me at Sam wants open models, what?

[-]

SgathTriallair@reddit

They released Open models before Dario and Ilya got upset about how powerful they were. Now that they are fine they released the Oss models (which admittedly aren't that good). That puts them closer to in line with Google's practice.

They are never going to be and to totally give away the only thing that lets them earn the money necessary to build AI. However it is Sam that created the industry standard that giving away access to your models for free is required to participate in the market.

[-]

Likeatr3b@reddit

Interesting, thanks for explaining.

[-]

FairYesterday8490@reddit

but... but romans, senators! they have got best models in the world.

[-]

Traditional-Card6096@reddit

They do the best models for now and get distilled like crazy, so I guess we can say they are doing their part fine.

[-]

ortegaalfredo@reddit

That's GLM and Minimax.

[-]

Awkward_Cancel8495@reddit

Kimi also lmao

[-]

Iwaku_Real@reddit

I would die for an open-source Anthropic LLM. Absolutely love Sonnet 4.5/4.6 even as a free user

[-]

px403@reddit

Good news, Deepseek 4 is coming soon :-D

[-]

Likeatr3b@reddit

Ah! Nice what can we expect?

[-]

Iwaku_Real@reddit

I really want vision though...

[-]

nomorebuttsplz@reddit

It’s called glm 5

[-]

fulgencio_batista@reddit

Even the anthropic version of gpt-oss would be amazing

[-]

francois__defitte@reddit

Open-source moats are temporary anyway. The real value is in the fine-tuning data, the evals, and the deployment infrastructure, none of which gets open-sourced.

[-]

AlwaysLateToThaParty@reddit

Anthropic won't release their weights, because it will demonstrate how much content they took without permission.

[-]

Cool-Chemical-5629@reddit

Another fun fact: They never will and making sarcastic posts regarding the fact will not make them change their mind.

[-]

CanineAssBandit@reddit

Good catch. Very pot and kettle rn with their whining

[-]

emprahsFury@reddit

This isn't a catch at all. Anthropic has always been fully closed. They've been full throats about how they don't believe ai is safe enough to publish weights.

[-]

PANIC_EXCEPTION@reddit

Which is stupid because other companies will do it anyways and those models will remain competitive. So the argument fully falls flat, and the real reason is they plan to make their models the absolute best at code so they are the Nvidia of agentic API providers; pay a premium or deal with sorta worse versions.

[-]

crewone@reddit

I think you are wrong. I think the upper layer of anthropic actually believes what they are telling people. (Read up on it in Empire Ai or some other good history of openai source)

For them it is all about reaching AGI first and preventing the 'bad guys' (the rest) from doing so. Same goes for Openai. Im still not sure if they are just nuts, genius, or both.

[-]

Best_Indication_1076@reddit

son una empresa y se quieren forrar y ya, que estas idealizando a una compañia

[-]

Conscious_Nobody9571@reddit

0 sympathy

[-]

amarao_san@reddit

Well, that's the new definition of 'open' OpenAI opened something, so it's open, Anthropic just sitting on it tight.

[-]

Pitiful-Impression70@reddit

honestly this is the one thing that bugs me about anthropic. like i genuinely think claude is the best model for coding and daily use but the fact they have zero open source presence while literally every other major lab has contributed something feels weird. even openai released gpt-oss which nobody saw coming. feels like anthropic wants to be the safety company but also wants to keep everything locked down which... are kind of contradictory positions imo

[-]

stddealer@reddit

And I have zero doubt they don't mind taking all the good ideas and the intelligence from open source models while contributing nothing in return.

[-]

No-Working7460@reddit

It seems to me that Chinese labs are now carrying open research on their shoulders. They deserve recognition from the community for doing this.

[-]

jacek2023@reddit

please note that OpenAI gave us gpt-oss and Anthropic gave us nothing

[-]

the__storm@reddit

Although to be fair, they're not called "Open AI" lol

[-]

Delyzr@reddit

Open as in everyone can use it, opposed to closed ai where only the owner can use it.

[-]

doomed151@reddit

So is DeepSeek, MiniMax, Z.ai, Alibaba, Mistral, Meta, etc. you get the gist

[-]

ies7@reddit

OpenAi also gave us Whisper

[-]

Hoodfu@reddit

OpenAI released the CLIP (Contrastive Language-Image Pre-training) model in January 2021 as an open-source project, and it was used as the text encoder in Stable Diffusion 1.x.

[-]

Iwaku_Real@reddit

gpt-oss is pretty shitty (in some ways) but it's better than nothing

[-]

phree_radical@reddit

And not only did OpenAI not release a base model, they gave us the first LLM actively trained against non-chat use

[-]

ManufacturerWeird161@reddit

I had the exact same roadblock last month trying to benchmark tokenizers for a low-resource language project. The lack of transparency around Claude's internals is a real pain for reproducibility.

[-]

sasuke___420@reddit

just use the token counting endpoint?

[-]

ManufacturerWeird161@reddit

The token counting endpoint helps for basic counting, but it doesn't give you the actual tokenizer vocabulary or merge rules—stuff you need when you're trying to adapt or compare tokenization strategies across languages.

[-]

lumos675@reddit

F them... if you want to use their models you pay 20$ and you can use it for few minutes per day...better they fail to be honest

[-]

bugra_sa@reddit

Yep, and it’s a strategy choice more than a technical limitation.

Some companies optimize for control/safety moat, others for ecosystem pull. Different incentives, different roadmaps.

[-]

TheRealMasonMac@reddit

Fun Fact: The Claude models have no knowledge of the typographic curly quotes: “ or ‘. They are unable to output them.

This broke my code at one point because it can't output that token.

[-]

-p-e-w-@reddit

I’m sure the model can output the token. My guess is they programmatically normalize quotes in the output.

[-]

nananashi3@reddit

No, TheRealMasonMac seems right. With a normal chat frontend connected to OpenRouter API, regex turned off, telling the model to copy the input exactly, including description of left/right single/double curly quote(s), Claude returns non-curly quotes, but Gemini returns curly quotes. It's known that Gemini loves (or loved) curly quotes, so we use regex to sanitize quotes.

[-]

-p-e-w-@reddit

Unless you mean the backend normalizes them before returning the response.

Yes, that’s exactly what I mean. I have no doubt the API-only providers run all kinds of postprocessing on outputs.

[-]

nananashi3@reddit

Okay.

Further testing makes me suspect there's no post-processing at all, and double curly and straight quotes are all the same token to begin with. Claude simply knows about typographic marks and Unicode codes from the training data, and infers what is used with semantic positioning. In reality I used three double straight quotes for the following response:

Let me look carefully:

He is 6'3".

He said "No!"

No, you did not use the same character for all of them. In sentence 1, the foot and inch marks ( ' and " ) are straight/prime marks, while in sentence 2, the quotation marks ( " and " ) are curly/typographic quotes.

Claude also insists I'm lying when I explain beforehand that I used the same character and that they are normalized to the same token in its model.

[-]

TheRealMasonMac@reddit

Hmm. It seems you're right.

[-]

Maxious@reddit

No. https://github.com/anthropics/claude-code/issues/18422

[-]

-p-e-w-@reddit

That doesn’t disprove what I wrote.

[-]

QuantumFTL@reddit

My Claude Code running Opus 4.6 can output the backtick character. How does that square with your claim?

[-]

TheRealMasonMac@reddit

I think you misread. Those are quotes, not back ticks. Some fonts render typographic quotes the same as regular quotes, but you can compare the Unicode codepoints.

[-]

QuantumFTL@reddit

Ah, thanks for the clarification. Those don't appear curly on the default reddit font on my display, but looking closely I can see what they are. The single-quote looked like a backtick at first glance (yay dyslexia).

Not sure what causes this, but it happens to me in both claude and copilot using Opus 4.6 so I'm sure it's on purpose.

[-]

BlobbyMcBlobber@reddit

Anthropic has interesting ideas but it seems they are actively against open source and local ai.

[-]

francois__defitte@reddit

The safety argument for not releasing weights is coherent only if you trust Anthropic's own risk assessments, which are not independently audited. You get "trust us, we know how dangerous this is" from the same org with commercial incentives to keep weights proprietary. Hard to separate genuine safety reasoning from competitive strategy here.

[-]

crewone@reddit

It is hard to trust anything coming from a multi-multi-trillion industry dominated by just a few tech overlords with more money than most countries. The amount of people actively in control of these few companies is scary few.

[-]

BananaPeaches3@reddit

They don't need to, the Chinese open source it for them.

[-]

Direct_Turn_1484@reddit

Be pretty cool if they did.

[-]

hustla17@reddit

Assume they would release an open source model.Would said model be somehow different than all the other models that have been released so far?

I have been hearing a lot that they use some secret sauce which makes claude as good as it is.

But I also heard that by focusing on programming the model gets logic for free, and that might be a reason for its performance.

Any insights appreciated.

[-]

OnedaythatIbecomeyou@reddit

I’d guess so?

If you haven’t used Claude before you probably should. since opus 3 and notably sonnet 3.5, their models ‘get it’, and it’s identifiably unique.

GPT is obviously the best at pretty much any given time, but it’s not changed that I must pre-empt what I don’t want, 3x as much as what I do want.

They also feel less benchmark-maxxed. Ask any competent model anything, you’re getting 200+words of hedging against all possible adjacents lol.

Claude has a way of answering the question you ask at a length that makes sense.

It’s pretty safe to say that if you’re using AI for ‘something’, you’re likely not too well versed at ‘something’ or might not even be able to name it. If a model doesn’t catch the meaning, each follow up poisons the well further.

On top of ‘getting it’. The recent Claude models are really good at pausing and asking/helping you to clarify before continuing.

As for your later question you’re gonna have to read the room on that one pal 😃

[-]

RhubarbSimilar1683@reddit

If GLM is any indication, it's a combination of lots of synthetic logic training data which can be deterministically verified and easily generated deterministically such as by using static code analysis, the soul.md file which promotes "truth", and using mostly only books for NLP

[-]