Hacker News new | ask | show | jobs
by ben_w 31 days ago
From what I've heard, every LLM before Mythos (which you can't get, they'll call you if you're big enough) will have far too many false positives to be helpful, so I guess the best option would be to use an agent to help you (not lights-off vibe coding!*) take advantage of all the older tools like valgrind and closing all the compiler warnings?

* I presume I'm not the only one to find the agents tasked with adding unit tests will sometimes try to sneak through "open source code and apply regex to confirm presence or absence of specific string literal".

They can speed you up significantly, but you absolutely do need to pay attention to what they produce.

2 comments

With all respect to the Anthropic folks, that's just marketing. (If they're reading this: let us into the program so I can be proven wrong here.)

I'm sure what they have is awesome, but it's clear that there are people out there with some decent prompts that are getting results out of widely available models as well.

The big thing we're sharing is: bulk scanning by random people in random geographies got a _lot_ better around January, it's widely distributed, and it's going to get a lot better regardless of whether that specific version of Mythos becomes widely available or not.

> prompts that are getting results out of widely available models as well.

Absolutely, and the "false-positive" issue people keep citing as why Mythos is so good is easily solved in the harness, simplest solution is starting fresh context with another prompt to evaluate if it's a false-positive or not, just adding that drastically cuts down the rate.

That is false. A year ago every LLM generated report was slop - more likely a false positive than correct. However in the past few months nearly every LLM generated report is real.
If your assertion of falsehood were true, the current top story on HN wouldn't be Turso shutting down their bug bounty due to overwhelming slop.
This is not good reasoning. You're offloading your thinking to "Turso" for some reason.

You're also assuming that they haven't made the alternative judgement that instead of triaging the haystack of slop that they get in order to potentially pay out to someone, they should instead be spending that cash and effort on tokens to find bugs in their own codebase.

You should read the mentioned article. They have hired some of the people they paid out to, and some of those people were LLM-assisted.

The claim I'm rebutting is "in the past few months nearly every LLM generated report is real." If that were true, there would be no need to close the bounty. The bounty is to address approaches that they themselves may not have considered, so would still hold value if the claim held true, as outside individuals may still hold unique LLM-assisted approaches and perspectives.

Everyone is going to see different results. Overall the general trend is AI is getting better. Although that might be partially people are shutting down their bounty programs which gives the incentive to generate slop.