|
|
|
|
|
by belter
375 days ago
|
|
"...A more serious bug is that the code that generates token IDs is not sound: it generates biased output. This is a classic bug when people naively try to generate random strings, and the LLM spat it out in the very first commit as far as I can see. I don’t think it’s exploitable: it reduces the entropy of the tokens, but not far enough to be brute-forceable. But it somewhat gives the lie to the idea that experienced security professionals reviewed every line of AI-generated code...." In the Github repo Cloudflare says: "...Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards..." My conclusion is that as a development team, they learned little since 2017:
https://news.ycombinator.com/item?id=13718752 |
|
I’m very confident I would have noticed this bias in a first pass of reviewing the code. The very first thing you do in a security review is look at where you use `crypto`, what its inputs are, and what you do with its outputs, very carefully. On seeing that %, I would have checked characters.length and found it to be 62, not a factor of 256; so you need to mess around with base conversion, or change the alphabet, or some other such trick.
This bothers me and makes me lose confidence in the review performed.