Hacker News new | ask | show | jobs
by mkl 558 days ago
How would a bunch of weights make a backdoor? The worst it could do is detect it's accessing an actual console and run a logged, visible command that tries to mess with your config or phone home, which is more of a front door with flashing lights saying "here I am!", so why would they bother?

Letting an LLM run arbitrary commands in your main user account seems risky even without worrying about conspiracies.

2 comments

It’s absolutely possible to put a backdoor into an LLM.

https://arxiv.org/abs/2408.12798

Just to wear my tin foil hat for fun, it's not that the model would attempt to phone home itself (what would it have to say, anyway?) but that given the opportunity it would go around kicking doors open for later infiltration by an outside party. Subtle bugs being introduced to your Django app, invisible characters that break your ssh configs, that sort of thing.
Yes, deliberately introducing vulnerabilities when generating code is a good one, and could be quite subtle. For running console commands though, anything touching configuration for ssh, gpg, bash aliases, ~/bin, cron, etc., should be immediately obvious.

I was thinking "here's an IP address and ssh key" would be what to phone home with, and that could be encrypted/hidden pretty well, but any network access should be pretty suspicious right away.