Hacker News new | ask | show | jobs
by godelski 945 days ago
> Whats the endgame here?

I don't mean to be rude, but at least to me the sentiment of this comment comes off as asking what the end game is for any hacker demonstrating vulnerabilities in ordinary software. There's always a cat and mouse game. I think we should all understand that given the name of this site... The point is to perform such checks on LLMs as we would with any software. There definitely is the ability to debug ML models, it's just harder and different than standard code. There's a large research domain dedicated to this pursuit (safety, alignment, mech interp, etc).

Maybe I'm misinterpreting your meaning? I must be, right? Because why would we not want to understand how vulnerable our tools are? Isn't that like the first rule of tools? Understanding what they're good at and what they're bad at. So I assume I've misinterpreted.

4 comments

Is there not some categorical difference between a purposefully-built system, which given enough time and effort and expertise and constraints, we can engineer to be effectively secure, and a stochastically-trained black box?
Yes? Kinda? Hard to say tbh. I think the distance between these categories is probably smaller than you're implying (or at least I'm interpreting), or rather the distinction between these categories is certainly not always clear or discernible (let alone meaningfully so).

Go is a game with no statistical elements yet there are so many possible move sets that it might as well be. I think we have a lower bound on the longest possible legal game being around 10^48 moves and an upper bound being around 10^170. At 10^31 moves per second (10 quettahertz) it'd still take you billions of years to play the lower bound longest possible game. It's pretty reasonable to believe we can never build a computer that can play the longest legal game even with insane amounts of parallelism and absurdly beautiful algorithms, let alone find a deterministic solution (the highest gamma ray we've ever detected is ~4RHz or 4x10^27) or "solving" Go. Go is just a board with 19x19 locations and 3 possible positions (nothing, white, black) (legal moves obviously reducing that 10^170 bound).

That might seem like a non-sequitur, but what I'm getting at is that there's a lot of permutations in software too and I don't think there are plenty of reasonably sized programs that would be impossible to validate correctness of within a reasonable amount of time. Pretty sure there's classes of programs we know that can't be validated in a finite time nor with finite resources. A different perspective on statistics is actually not viewing states as having randomness but viewing them as having levels of uncertainty. So there's a lot of statistics that is done in frameworks which do not have any value of true randomness (random like noise not random like np.random.randn()). Conceptually there's no difference between uncertainty and randomness, but I think it's easier to grasp the idea that there are many purposefully-built finite systems that have non-zero amounts of uncertainty, so those are no different than random systems.

More here on Go: https://senseis.xmp.net/?NumberOfPossibleGoGames And if someone knows more about go and wants to add more information or correct me I'd love to hear it. I definitely don't know enough about the game let alone the math, just using it as an example.

> the sentiment of this comment comes off as asking what the end game is for any hacker demonstrating vulnerabilities

GP isn't asking about the "endgame" as in "for what purpose did this author do this thing?". It was "endgame" as in "how is the story of LLMs going to end up?".

It could be "just" more cat and mouse, like you both mentioned. But a sibling comment talks about the possibility for architectural changes, and I'm reminded of a comment [1] from the other week by inawarminister ...

[1]: https://news.ycombinator.com/item?id=38123310

I think it would be very interesting to see something that works like an LLM but where instead of consuming and producing natural language, it operates on something like Clojure/EDN.

Okay yeah that makes more sense.

To respond more appropriately to that, I think truthfully we don't really know the answer to that right now (as implied my my previous comment). There are definitely people asking the question and it definitely is a good and important question but there's just a lot we don't know at this point. What we can and can't do. Maybe some take that as an unsatisfying answer but I think you could also take it as a more exciting answer as in there's this great mystery to be solved that's important and solving puzzles is fun. If you like puzzles haha. There are definitely a lot of interesting ideas out there such as those you mentioned and it'll be interesting to see what actually works and if those methods can actually maintain effectiveness as the systems evolve.

Debugging looking for what though? It's interesting trying to think even what the "bug" could look like. I mean, it might be easy to measure arithmetics ability of the LLM. Sure. But if the policy the owner wants to enforce is "don't produce porn", that becomes hard to check in general, and harder to check against arbitrary input from the customer user.

People mention "source data exfiltration/leaking" and that's still another very different one.

No, the other comments that talk about possible architectural evolutions of LLMs are more in line with the intent of my question