Curl lead developer Daniel Stenberg provides insightful feedbacks from Mythos analysis results

[-]

Systemerror7A69@reddit

This was really interesting to read, especially in combination with the Firefox report recently.

Initially I was under the impression that Mythos was a lot of marketing hype, but the Firefox report changed that a bit. It seemed definitely capable and more than "just" hype - but it seems like the better your previous security scans ( with and without AI ) the less Mythos will find.

[-]

danielcw189@reddit

If we compare CURL and Firefox, CURL is a lot smaller and limited in Scope (Firefox needs to be able to do everything that CURL can do, right?).

So it is not surprising that there are less bugs and security vulnerabilities to be found.
But it still found something, so it apparently was a helpful tool.

Mythosnwas over-hyped, especially by non-IT-people, but it is good, or at least a useful tool.

Too bad we don'thave realistic pricing for Mythos and most modern AI stuff yet. So we can't tell if it would be worth it.

[-]

fntd@reddit

Firefox needs to be able to do everything that CURL can do, right?

No. curl supports all kinds of protocols that Firefox doesn‘t. For example FTP was removed from Firefox a while ago.

[-]

psychometrixo@reddit

Technically correct, the best kind of correct

But Firefox is a massive far FAR larger project than curl. You know what OP meant.

But but Firefox doesn't have FTP is just not the point

Mythos may be pure useless hype, but that doesnt make curl smaller than Firefox

[-]

fntd@reddit

Dude I just answered a question.

[-]

psychometrixo@reddit

You answered a hypothetical question with a technically correct but clearly irrelevant answer

Or did you mean to imply that curl is actually bigger than Firefox?

[-]

danielcw189@reddit

thanks for the correction

[-]

Systemerror7A69@reddit

Well, my thought was less about "Mythos is not useful" but more "how much BETTER than already existing models is it?"

Especially because it seems like the model matters much less in this stuff than the harness you build and the tools you give it. How big is the difference from Opus to Mythos when

a) simply given code and prompted

vs

b) given proper tools, ability to verify findings, run tests & more?

Is there a big gap when you have the first and only a small or no when the second for example?

[-]

danielcw189@reddit

just in case and to be clear: it seems to be the nature of reddit that replys are often seen as disagreement. That wasn't my intention. I simply used your thoughts as a starting point for my own.

A part of your comment was about "hype" and I think this article does not dispell the hype. But that is talking about hype from a sane point of view with an IT background, and not the hyperbolic overhyped statements from marketing and nedia aimed at a general audience

[-]

Acrobatic-Watch-8037@reddit

Nobody is claiming Mythos is not better; they're pointing out that it's nowhere near as an improvement as has been claimed by the astroturfing social media posts.

[-]

Systemerror7A69@reddit

Yes, that was what I intended to say. It's hard to judge just "how" much or little better it is without any access ( on top of LLMs already being a nightmare to properly benchmark and test due to their non-deterministic nature )

My opinion fluctuated a bit with these articles

[-]

FastHotEmu@reddit

In this context, feedback is an uncountable noun. It should be:

“provides insightful feedback”

[-]

Acrobatic-Watch-8037@reddit

To put into context, you do not have 1 feedback, and 2 feedbacks; you have 1 piece of feedback, and 2 pieces of feedback.

[-]

danielcw189@reddit

I guess Feedback has become a loanword in his language (Swedish?), and there it is acceptable to use a plural form, turning it into a fakse friend when he is writing in English

Before your comment I was not aware that Feedbacks is not allowed in English

[-]

knome@reddit

you could manage a "feedbacks", but you would need to be discussing multiple sources of feedback while specifically using the language to transform it into a countable noun. "of the various feedbacks I have received", where various does the job of splitting it into something that is countable instead of an amorphous wad.

like how one pours applesauce from a jar, but can have a variety of applesauces at home. or how coffee and water don't indicate amounts, but saying coffees or waters indicates they are batched into countable units (cups of coffee/water or the different seas, etc).

[-]

ShinyHappyREM@reddit

curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts. This is only natural of course since the first tools we ran had many more and easier bugs to find. As we have fixed issues along the way, finding new ones are slowly becoming harder.

To properly compare the AI tool (not saying that the author should've done this), this tool plus the previous ones should perhaps be tested on an older version of the code base that contains a lot more bugs and security issues.

[-]

sharlos@reddit

It makes sense, but then you run into issues where you can't be sure the model wasn't trained on those older bug reports.