Hacker News new | ask | show | jobs
by magnoliakobus 796 days ago
They tested it on websites, containers, and python packages. So three things with source available and a total sample size of 15. This isn't anything new really, if you give it a piece of insecure python code or a terribly misconfigured Dockerfile and ask "is there a problem with this?" GPT-4 will obviously spot the vulnerability much of the time. LLM agents won't be spinning up Ghidra and locating a use after free vuln or something any time in the foreseeable future, let alone one it doesn't already have a blueprint for.

Edit: An LLM agent could also presumably navigate to the links within the CVE that contain the exact commit which patches a given vulnerability, some also contain links to PoC exploit code themselves, I forget if this is touched upon in the paper.