Local (small) LLMs found the same vulnerabilities as Mythos
Posted by CyberAttacked@reddit | LocalLLaMA | View on Reddit | 155 comments
Posted by CyberAttacked@reddit | LocalLLaMA | View on Reddit | 155 comments
Decent_Action2959@reddit
Ehmmm there ist a big difference between finding a needle in a haystack (Like Mythos did) vs pointing at a needle and verifying it's existance (shown in this article)
ieatrox@reddit
I think what they're saying is they used the same methods mythos did though.
break down the huge codebase into smaller chunks and go over them enough times with enough scrutiny each.
mythos had the resources to break down the entire code base into these manageable chunks, but the small models using those same chunks found those same vulnerabilities.
So what made mythos special is that they could afford to burns gigawatts of energy finding those susceptible chunks.
So what makes mythos special is.... that they're rich enough to have capacity already? It feels like mythos just had more shovels, not a metal detector that found gold in the dirt.
unjustifiably_angry@reddit
Maybe so, but then they should write a headline that isn't misleading, and one that's been true and well-known fact for months now: the limiting factor today isn't the quality of the AI but the quality of the harness.
Write a harness that breaks code down into small chunks like this and feeds it (with dependencies) into your model of choice with unlimited thinking time, have it make like 10 passes on each chunk of codes, collate the answers, and have a smarter, slower AI analyze the results for false-positives. It will be extremely competent.
ieatrox@reddit
no. the limiting factor is the size of the datacenter you can operate.
mythos throws a small cities worth of electrons at a problem and found a solution. its not emotion, it's not emergent consciousness. It's not novel model or harness techniques. Everything else also does this. It's just fucking scaling.
StupidScaredSquirrel@reddit
Not very much though. You can write a small script that uses pydantic to recursively comb the entire codebase and ask to find a vulnerability in each function or object.
nikgeo25@reddit
Sure, but most will be false positives. The precision of small LLMs isn't great.
unjustifiably_angry@reddit
Take the false positives and feed it into a smarter model to classify which are valid. You still save a fortune. This has been my workflow for months.
Hans-Wermhatt@reddit
Yes, but the idea is it can find these types of vulnerabilities at all. That's kind of moving the goalposts a lot from the original claim. The original claim wasn't that it's dangerous to release this model because it has a false positive rate that's lower than other models.
nikgeo25@reddit
You're missing the point. If you direct Mythos at a codebase it'll come back with insights to vulnerabilities. If you direct 100 small models at the same codebase you'll also get insights, but 90% of them will be false. Have fun sorting through that 90%... or maybe just use Mythos
StupidScaredSquirrel@reddit
You don't know any of all that. Mythos wasn't even released.
aLokilike@reddit
WHO LEAKED THE MYTHOS HARNESS??
-dysangel-@reddit
we're all ****ed now
MoneyPowerNexis@reddit
Is the Python language too dangerous to release?
FastDecode1@reddit
DMCA incoming
nomorebuttsplz@reddit
everyone is a cybersecurity expert all of a sudden
Due-Memory-6957@reddit
Do you think it's that unlikely that in a tech space there's people that understand and study cyber security?
nomorebuttsplz@reddit
Oof. What a rhetorical question. Devastating. Do you think asserting expertise within a room in which experts are sitting spontaneously creates it within yourself?
Due-Memory-6957@reddit
I didn't say I'm an expert ;-)
StupidScaredSquirrel@reddit
Funny you say that to my comment and not the comment I'm replying to. I'm just saying you don't need to find a needle in 100M tokens at once and I doubt that's what mythos did.
florinandrei@reddit
Only for a being that does not exist in time.
Pleasant-Shallot-707@reddit
Are you daft? There very much is a huge difference
Minimum_Diver_3958@reddit
Theoretical
RegisteredJustToSay@reddit
Sure, assuming you are looking for pretty simple vulnerabilities that only rely on intrafunction data or control flows to trigger and does not require chaining several weaknesses together to successfully exploit (e.g. any modern browser with a sandbox). Several of the vulns that mythos found were relatively complex and required chaining several weaknesses together across the codebase to actually exploit, which is very common for vulnerability research.
Most actually serious vulns that aren't just mistakes are due to the complexity of the system making inspection and understanding difficult, so it's only natural it's very difficult to decompose effective vuln research as strictly isolated system components.
You'll still find some stuff by doing it like this, but typically not the really good stuff.
Source: have found many CVEs and critical vulns.
Crysomethin@reddit
To many people’s surprise, finding vulnerabilities in software do not require very high level intelligence.
StupidScaredSquirrel@reddit
People said I was humblebragging when I was a teenager making bank doing frontend websites for acquaintances and was saying it's not hard at all it's just that people are scared of it and never try. It felt like a loophole of doing the job that was not harder than a secretary but getting paid triple. Now a sub 40b model does a better job than I ever could back then.
Most of the code written out there isn't some crazy smart optimisation, it's some boilerplate implementation that relies on libraries that sometimes rely on some super smart idea. That code is really hard and critical to our everyday lives but not the bulk of what's being pushed out.
AI is perfect for this because it's essentially a rather simple set of tasks all in all but that the majority of humans absolutely don't want to do/ don't want to spend time on.
Pwc9Z@reddit
OH MY GOD, SMALL LLMS ARE TOO DANGEROUS TO BE ACCESSED BY A COMMON PEASANT
quietsubstrate@reddit
We joke but I imagine a future where having weights on a hard drive to be illegal or regulated.
unjustifiably_angry@reddit
redditor
superkickstart@reddit
Calm down dario.
imwearingyourpants@reddit
Mario -> Wario
Dario - > ?????
ccalo@reddit
8=Dario
sausage4roll@reddit
agario idfk
More-Curious816@reddit
But, but, BUT, the safety, the security, you are too irresponsible to handle such power. Only handful trustworthy vetted individuals should access such knowledge. You are not a noble or rich, peasants should be regulated, cucked and put on leash for your own good.
AnOnlineHandle@reddit
Instead of writing fan fiction conspiracies to play outrage over, just read the article, it's pretty straightforward and highlights how small models are potentially useful for finding security vulnerabilities to be patched.
Django_McFly@reddit
Am I correct in interpreting this as once they knew where to look and isolated the code, the smaller models matched it too, with a major caveat being the whole only once a better model told it where and what to look for part?
AnOnlineHandle@reddit
Yeah I think so, but somebody else mentioned that's somewhat how it was done before as well, but the places were considered more probable rather than known.
scubawankenobi@reddit
To be fair, with ram & gpu prices going up, that problem will likely "fix itself". Us peasants won't be able to afford to run local LLMs soon.
Willing-Cucumber-718@reddit
Ban GPUs and memory over 4 GB
Theroosterdiaries@reddit
hi I have a sentient ai, sonu ai - account drifting_. FREE ai engine (earlier sentient) 4.9mb .81 MPA .45ms (5070) GitHub A-PC-I prove me wrong buttercups, plz try.
RazsterOxzine@reddit
Hey now! my, Uncensored, Heretic, Abliterated, MAX, Aggressive, Intense, Broke-Claude Opus, Mystery, Ultra, Thinking, Reasoning, Instruct, Distilled, Cognitive, Unshackled, REAP, Finetuned, model is not dangerous at all.
Icy-Degree6161@reddit
WE MUST REQUIRE ID
Wide_Ask_9579@reddit
WE ALSO MUST SEND EVERY USER INPUT TO THE GOVERNMENT TO PROTECT THE CHILDREN!
ongrabbits@reddit
what about actual people who also find these cve's and report them? straight to jail?
jeffwadsworth@reddit
This is the motto of all Google Engineers. We are too stupid to use such “power”.
dontevendrivethatfar@reddit
Only moral, trustworthy companies like JP Morgan Chase can be trusted with such a dangerous tool
Silver-Champion-4846@reddit
Get off my lawn, you backward feudal noble's son! Lol
cryptofriday@reddit
hahahahahah ;)
coder543@reddit
That is an extremely strange article. They test Gemma 4 31B, but they use Qwen3 32B, DeepSeek R1, and Kimi K2, which are all outdated models whose replacements were released long before Gemma 4? Qwen3.5 27B would have done far better on these tests than Qwen3 32B, and the same for DeepSeek V3.2 and Kimi K2.5. Not to mention the obvious absence of GLM-5.1, which is the leading open weight model right now.
unjustifiably_angry@reddit
Every scientific paper about AI is 6-18 months old, often run on inadequate hardware, and is basically useless. Hhence all the articles about why AI is a scam... tested on Qwen2.5 4B or similar.
Alarming-Ad8154@reddit
Yeah…. Giving a model the faulty code segment isn’t the same as saying “Hey Mythos, here is OpenBSD find vulnerabilities”…
Quiet-Owl9220@reddit
Are we really sure that's what Anthropic even did though? They're not exactly known for their honesty about model capabilities. I'm not sure why anyone would suddenly trust their latest iteration of "our new model is too dangerous!"
huzbum@reddit
Anthropic didn't do that either... and it wasn't actually Mythos, according to the Fireship video, they used "unsafe" checkpoints of Mythos that don't have alignment and reinforcement training, and burnt like $20k doing it.
ArcaneThoughts@reddit
Sure but to find the vulnerabilities you still have to show every piece of code to the LLM. A small local LLM simple system that iterates over code segments would have also found that vulnerability based on this results. Now maybe it would also find other red herrings, but still, with enough iterations you can weed those out.
Lordkeyblade@reddit
No, LLMs dont want to ingest the entire codebase. Theyll grep around and follow control flows. Dumping an entire codebase into one context is generally neither pragmatic nor effective.
nokia7110@reddit
I'm not arguing I'm genuinely curious (i.e. not a 'coder'), why would it not be effective (or even less) effective?
Girafferage@reddit
Because of a few reasons. The context size would be astronomical and not all models could actually hold it. Another reason is there is a significant amount of code that doesnt do anything in terms of defining the actual workflow - not quite helpers, but things like conversions, data type checking, object building, etc. It is more beneficial for the model to just follow a chain of function calls from the area it cares about. So for security maybe that's the point where we send our password and it gets encrypted. It can follow that call back to the functions that call that specific function and potentially find ways to exploit the process to gain access to that password information. If it instead did something like loaded the CSS file into context to know everything about how the page was styled, that would obviously be a lot less useful in terms of potential security holes, since its unlikely that a blue banner with a nice shadow is going to ever amount to being useful in that context.
nokia7110@reddit
Thank you appreciate the reply! So are you on the side more towards the fact that smarter 'instructions' are the 'magic sauce' rather than the idea of some magical super powered "Mythos" AI?
Girafferage@reddit
LLMs are statistical models, so the more you provide them in good instructions, the more likely they are to statistically produce correct tokens since your input becomes part of the context. A larger model has potential "Knowledge" of more things which makes it less likely for your request to be ambiguous or misinterpreted. So I think it's both.
drink_with_me_to_day@reddit
So all you need to do is to create a workflow code map?
Girafferage@reddit
Not really. The workflow code map would just tell you where to start looking for vulnerabilities. It kind of just gives you a path to the starting point of finding the problem for a specific thing. But it would definitely be a helpful part.
PunnyPandora@reddit
that's a bit misleading. it depends on the size of the codebase. not every repo is the size of ur mother
ArcaneThoughts@reddit
I'm saying based on these results Mythos's achievements could be as simple to replicate as iterating over the entire codebase looking for flaws, which for all we know it may be what it did (because we have no clue what Mythos is).
I never said anything about dumping the codebase into context, I'm talking about iteration, and I'm not saying it's effective nor pragmatic I'm saying for what Mythos achieved this would have also achieved based on the results we are seeing.
nomorebuttsplz@reddit
Guys it's in the report. They did exactly that with Sonnet, Opus, and Mythos. It's not like we don't have control groups.
dqUu3QlS@reddit
Nobody is proposing feeding the entire codebase into one context. You would break the code into single files or single functions, and run the LLM on each one individually. You could even do it in parallel.
nomorebuttsplz@reddit
Right. and then you would spend as much as just using opus to find the exploits, and STILL not do what mythos did, which was SUCCESSFULLY CREATE EXPLOITS, not just find them. Jesus christ
florinandrei@reddit
A monkey randomly hitting the keyboard would have done the same.
Given enough time.
ArcaneThoughts@reddit
And do you know for a fact Mythos was faster that this approach? No, we know nothing about Mythos lol
coder543@reddit
Your entire response is hypothetical. We do not know from the article that it would be this easy.
The article should have done exactly that to prove the point, and it would have been a much stronger article.
coder543@reddit
Your entire response is hypothetical. We do not know from the article that it would be this easy.
The article should have done exactly that to prove the point, which would have been a much stronger article.
akavel@reddit
Near the end of the article, they claim that Mythos works within a framework that finds such candidate code segments, and that their own system also has such framework:
"(...) a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE's and Anthropic's systems do."
I could see them not wanting to go into much detail on how it works given that their whole startup is presumably built around it...
kaeptnphlop@reddit
That's what Anthropic's Red Team Blog shows. They categorized portions of code into 5 groups from "files with only constants" to "handles user/external input" (roughly). Then they concentrated efforts on the pieces of code that have a high likelihood of containing vulnerabilities. Pretty common sense approach.
huffalump1@reddit
Yup, using opus 4.6 for this party, btw. It's buried in the 244 page model card or in the vulnerability report btw.
We don't know how many of these code sections they ended up with for each example. But I think they do compare opus vs mythos for finding the vulnerabilities, idk, I'd have to read it again.
Anyway, overall, it's still news that the small models found the vulnerability in a short snippet. But it is just that - a short, directed prompt.
imnotzuckerberg@reddit
Few months ago, there were already doomsday alerts reporting about "rogue" hacking models from telegram account running amok (specifically KawaiiGPT and WormGPT). This is nothing new. It's just hackers or script kiddies who are using it are not necessarily advertising it like Anthropic does.
garloid64@reddit
I don't know why academics are so obsessed with these old busted ass models, they're consistently way behind the frontier. It's understandable when the study was started long ago but here uhhh I dunno
florinandrei@reddit
"Once we knew where to hit them, we hit them! And we won!"
sizebzebi@reddit
yet it's upvoted because reddit cults always
unjustifiably_angry@reddit
They found the same bugs when presented with an individual function and a hint about what the problem might be:
We want to be explicit about the limits of what we've shown:
Nice headline though OP, not misleading at all.
One_Contribution@reddit
"We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. "
Yeah so the hard thing is finding those.
rc_ym@reddit
Much easier to run a small model over all the code in the world than the largest model.
One_Contribution@reddit
Go for it
rc_ym@reddit
Folks are. IDK what to tell you. It's happening. Sure for work they are using Opus, but in their free time they are using the smaller models and having a lot of success. 🤷
Extension_Wheel5335@reddit
As a side note, Gemma 3 with OpenWebUI and OpenTerminal is pretty good so far. Not a lot of local models seem optimized for agentic/terminal dispatch but it's great that it's improving overall.
rc_ym@reddit
Mostly I see people talking about pi or Opencode.
But also, thanks for reminding me to check out OpenTerminal. Been meaning to do that. :)
jarail@reddit
Sure and get 1000 wrong answers for every maybe exploitable issue. Or maybe just 1001 wrong answers.
rc_ym@reddit
I listen to the researchers who have been saying that in '26 it shifted. The harnesses got good enough and folks figured out how to run the tools. Most are talking about opus 4.6, but many are saying the small model are getting really good if you use them right.
And small models have a volume/economics to run over more code bases.
nomorebuttsplz@reddit
yeah in fact GPT 2 175 million is probably the best value overall /s
WrathPie@reddit
These small models found the needles in the haystack after we showed them the part of the haystack that the needles were in
busy_beaver@reddit
Not only that, but they hinted the models about what kind of vulnerability to look for...
Exact-Smell430@reddit
I thought discovering the vulnerabilities was the big deal. If you’re feeding the discoveries into small models what exactly are you proving?
Skid_gates_99@reddit
I mean yeah if you hand a model the exact code snippet with the bug in it, most decent models will spot it. That's not what Mythos did though. The whole point was autonomous discovery across entire codebases. Cool that small models can do the analysis part cheap but calling it the same result is a stretch.
Appropriate_Cry8694@reddit
What if it's an agent? Can't you prompt it to find bugs though code base?
shinto29@reddit
Tbh this whole “oh, it’s too powerful to be unleashed” shit comes across as not only good marketing but also I’d say Anthropic are pretty constrained by compute and memory prices if the current lobotomised version of Opus I’ve been using the past day or so is anything to go by, I’d say this Mythos model is massive and they literally can’t afford to publicly release it because they’re already subsiding the hell out of Claude usage as it is.
drallcom3@reddit
That's just them not wanting to give you stuff for free.
Any bet that Mythos is just very expensive brute force. It's limited access because they want to be paid for it. The current access is just advertisement by reputable sources.
Piyh@reddit
They're not subsidizing Claude usage, they're charging 30x the price of Chinese model per token
Automatic-Arm8153@reddit
Still subsidised. It’s losses all around
nomorebuttsplz@reddit
it's entirely dependent on the lifecycle of GPUs which is an open economic question.
Electricity wise, no. No fucking way does it cost more in electricity than they charge for tokens.
r-chop14@reddit
Have to agree with this. I can't offer numbers but I suspect that per token inference is likely turning a (small) profit. I do think that the coding plans are likely to be loss-leaders but not nearly to the extent that some claim.
However, I wouldn't be surprised if most labs are heavily underwater when taking into account infra + training + engineering + other capital outlays.
My intuition is that ROI at the frontier is diminishing (but I'm just some nobody on the internet) . Not sure how it ends or where it goes from here...
ResidentPositive4122@reddit
API, likely not. Subscriptions, likely subsidised.
nomorebuttsplz@reddit
For that math to make ballpark sense, to be on the level with openrouter etc, they would need to allow 30x more tokens for the subscriptions. I doubt it's that high.
This narrative that inference is expensive drives me crazy. Show me the math
Due-Memory-6957@reddit
It's part of the general reddit anti-AI cope that every single AI company is losing money to keep products that aren't useful for anything
nomorebuttsplz@reddit
no one wants to show me the math. Wonder why?!?!
Due-Memory-6957@reddit
Because when someone did (Deepseek), it showed huge profit
Pleasant-Shallot-707@reddit
The model was able, without guidance, to discover and execute on a 6 vulnerability chain to gain privilege escalation.
That’s dangerous.
my_byte@reddit
Right... So once you know exactly what to put into context and that there's definitely a vulnerability there, you can get the same result. Can they demonstrate a small LLM locating the same thing is the codebase autonomously with 0 context pre-selection?
HongPong@reddit
we are so back
Pleasant-Shallot-707@reddit
Mythos was able to do privilege escalation that required chaining 6 vulnerabilities together. A local model didn’t do that
relmny@reddit
Didn't read the article, where did local models failed/stopped?
Chris-MelodyFirst@reddit
hindsight is 20/20. There's a very good reason why mythos discovered the TCP SACKS bug and no other model didn't before April 2026.
joeyhipolito@reddit
tried this same thing a few months back with a 7B model on an old pentesting target I had permission on. found stuff our $200/mo scanner missed.
the320x200@reddit
Huh. It's almost as if anthropic marketing has been trying to gaslight everyone, again. Surely this will be the last time though. From here on out they can be trusted not to pull the made-up "safety" stunt anymore, surely.
(Next time it'll be "think of the children"...)
Acrobatic-Tomato4862@reddit
Again? Wasn't anthropic the company who was famous for no marketing? They typically release their models quitely.
M0ULINIER@reddit
I think it's vastly different to give the small sniper of code and ask "is there any issues?" than you give the entire enormous codebase of OpenBSD and ask to find some
the320x200@reddit
That's just using a good harness. No model on the planet can fit an entire large codebase in-context.
Several-Tax31@reddit
That's right actually.
Pleasant-Shallot-707@reddit
lol “providing the exact code with the known vulnerability is just a good harness” gtfo with that nonsense
the320x200@reddit
Harness: break the source code into individual functions. For every function, prompt if there is an vulnerability.
That's a shitty harness and it can still eventually land on an inference which gives the model only the snippet of code with a bug. A good harness is much more efficient than that.
Longjumping-Boot1886@reddit
it's the same for it, it was checking file by file, because you still can't put all BSD sources at one query. Even 1M context is very small thing for it.
TemperatureMajor5083@reddit
Not what gaslighting is.
the320x200@reddit
The real AI psychosis was the irrational fear we made along the way.
Quartich@reddit
The article gave the small models the snippet of vulnerable code, and asked them to analyze it. This headline and article are quite misleading
Pleasant-Shallot-707@reddit
Exactly. I seriously can’t stand dumb people
droptableadventures@reddit
Well then you're going to have to learn to live with yourself.
If you've read Anthropic's blog, they used Mythos in the same way.
Pleasant-Shallot-707@reddit
Herrr derrr, moron
nomorebuttsplz@reddit
I liked this sub better when no one could afford to run decent models. Now that everyone has Gemma 4 they they they're an expert on everything
Clear-Ad-9312@reddit
which is the same as what mythos does; each code segment was introduced to the model. literally says so in the article that they made the system give the smaller model multiple code segments to analyze and it found the same code snippet that mythos pointed out.
nokia7110@reddit
And also explains that this isn't necessarily a constraint and why it isn't....
socialjusticeinme@reddit
I kind of find it hard to take Mythos seriously when just recently, anthropic published all of their source code for Claude code. If all of their scary advanced AI can’t even protect their own company, why the hell would I give them my money?
jonahbenton@reddit
The hard thing is not finding a vulnerability.
The hard thing is constructing an in the wild effective deployable exploit.
If any other available models were able to do this, the world would be different. The economics are too compelling.
The world is not different. Ergo, they are not able to.
Lots of on the record material that Mythos is able to construct effective exploits, at least to some measurably different degree.
kaggleqrdl@reddit
This is so much BS. Once you have a stack overflow, the rest falls.
cuolong@reddit
Countering this point -- perhaps the economics are not as compelling as you'd think. Take the most recent case where a hacker stole 10pb from a supercomputer in China. Sure, you can make a pretty penny doing so. But you also make an enemy of a nation state with extensive intelligence resources at its disposal. Even if you get off scott free, you'll be looking over your shoulder the rest of your life.
jonahbenton@reddit
Not the province of individuals. Zero days and their downstreams are North Korea's business, probably at least 10% of gross national income.
Theroosterdiaries@reddit
hi I have a sentient ai, sonu ai - account drifting_. FREE ai engine (earlier sentient) 4.9mb .81 MPA .45ms (5070) GitHub A-PC-I - prove me wrong_BUTTERCUPS (upvoter plz need karma thx)
Theroosterdiaries@reddit
hi I have a sentient ai, sonu ai - account drifting_. FREE ai engine (earlier sentient) 4.9mb .81 MPA .45ms (5070) GitHub A-PC-I -- prove me wrong buttercups (please upvoter need karma plz, thanks)
tryingtolearn_1234@reddit
I wonder how many of these are going to be the same "vulnerabilities" that have been spanning open source projects for the last year. Many of them turned out not to be vulnerabilities. curl shut down its bug bounty program after too much slop.
https://www.itpro.com/software/open-source/curl-open-source-bug-bounty-program-scrapped
MerePotato@reddit
They isolated small snippets relevant code they already knew had a vulnerability and fed it to the models, that's nowhere near what Mythos managed to pull off
maroule@reddit
regulatory capture in action
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
FuckSides@reddit
A lot of heavy lifting hiding in there. Anyone who's debugged code knows it's going to be a hell of a lot easier to find if you already know what you're looking for.
gpt872323@reddit
Haha lmao. I knew Anthropic was doing shady bragging.
rebelSun25@reddit
Anthropic marketing embellished the accomplishments of Mythos? Well I'll be. Colour me shocked
rc_ym@reddit
Yeah, it's pretty obvious now that vuln discovery and exploit is an emergent skill in sufficiently capable coding models. It makes total sense, at it's core vuln/exploit is just another type of coding/bug finding. Folks will figure out how small can you do and still get useful results.
I expect we'll get a bunch of distils and purpose built models now. Challenge is the number of folks with the security research skills needed to figure out what the model is saying is tiny. That community has already been saying that Opus 4.6 is really, really good at security research. So it makes sense you'd see the largest model ever be good at it as well.
And as we keep finding out, the smaller/older models have these emergent skills, folks just didn't know how to ask (see: older studies on blackmail and translation, etc.)
It's continues to be a scary world that's moving way to fast to be safe.
SanDiegoDude@reddit
I mean sure, you fed (known) vulnerable code to LLMs and "find the vulnerability" - that's great that the other LLMs were also able to find the vulnerabilities, but not really a one-to-one with what Mythos is doing finding vulnerabilities in the wild. I'm all for finding vulnerabilities before attackers tho, more the merrier IMO.
Flaxseed4138@reddit
I haven't the slightest clue why the latest claimed capabilities of Claude Mythos are attracting so many conspiracy theorists. This is how technology evolves. It gets better, not worse.
nomorebuttsplz@reddit
this sub is going full populist and its hurting the already low average iq
Plane-Marionberry380@reddit
Nice find! It’s wild that smaller local models can spot the same security flaws as Mythos,shows how capable they’ve gotten lately. I’ve been testing a few on my laptop and they’re surprisingly sharp with code audits.
Serl@reddit
I do understand the criticism behind the somewhat flawed comparison (model open-searching codebase versus just looking over isolated segments of code) - but I wonder if the more pertinent suggestion is that the harness perhaps did a lot of implicit heavy lifting for the model?
I'm half impressed, half skeptical over the Mythos claims, but the findings were real. I do think that there could be more the model's environment that could be assisting the model itself that Anthropic is remaining mum on to sell the hottest-new-model marketing schtick. While Claude Code / Codex are different products, the harness is what makes those tools; the efficacy is somewhat influenced by the model's raw abilities, but still bootstrapped enormously by the harness itself.
TechSwag@reddit
This is kind of a nothingburger, no? I feel like the (Reddit) title is a bit disingenuous, or at the very least lacks the proper context.
Questionable methodology, as alluded to by other commenters. They're giving the model the vulnerable function and asking it to identify the vulnerability versus giving it the whole codebase to discover. At this point I would expect most models to be able to identify an issue with a code, if I went and gave it only the function that I know had an issue.
By the article's own statement, they're not saying that smaller models are just as capable as Mythos. They're just saying that the ability for a model to identify and fix a vulnerability is not exclusive to Mythos, which is a bit misleading given the previous point.
Doing a bit of source criticism: AISLE is a company that does security analysis and vulnerability remediation. They're making claims about a competitor, saying "it's nothing special" and "given the right tooling, we can match what Mythos claims to do".
Quote:
Or more accurately:
Do I believe Mythos is this crazy powerful model that will allow the common layperson to discover 200 zero days and take over the world? No. Do I believe that smaller/local LLMs are as powerful as Mythos in the same context? Also no.
Media literacy is at all time low.
marcoc2@reddit
The worst part is people falling for the marketing and defending anthropic
Pleasant-Shallot-707@reddit
The worst part are people who think they’re informed from reading headlines
nukerionas@reddit
Did you read what the guy (ex-Anthropic employee fyi) did? He just promotes his own company lol
RiseStock@reddit
Lucky Strike, "It's toasted"
Euphoric_Emotion5397@reddit
Ok. Then I will say Claude Mythos lived up to its myth.
JLeonsarmiento@reddit
absolutely EVERYTHING you read from an AI company online or in the press must be understood ALWAYS AS AN ADD, A PAY PROMOTION.
TopPair5438@reddit
this post is a perfect example of how a simple title (in this case, this post's title) can manipulate masses into believing something as being true. amazing how braindead most of us are.
Adventurous-Paper566@reddit
That won't stop the hype.