|
|
|
|
|
by anuramat
8 days ago
|
|
> tweak badness enough assuming you get to do gradient descent AND the context is fixed+known AND you have unlimited compute? sure; is it a realistic setup? > the only way to fix ... the exact same argument applies to any (sufficiently complex) piece of software, with exactly the same conclusion also technically I'd argue that we do know the input/output space (set of all token strings of length <=
N/token), and know the mapping (the model is a ~pure function in terms of the api, which is about as good of a representation as it gets for a non-invertible mapping); at least it's much closer than with something like linux |
|
> assuming you get to do gradient descent AND the context is fixed+known AND you have unlimited compute? sure
so... it's possible to attack these models with the formulation i described, just with some particular assumptions.
the AI safety/security problem is about trying to make this sort of thing very difficult to do, so much so that an attacker wouldn't try. that's not fixing the problem, that's mitigating the problem. two very different things. as the article we're commenting under shows, it's really not difficult to do nasty prompt injection attacks right now.
> technically I'd argue that we do know the input/output space (set of all token strings of length <= N/token), and know the mapping (the model is a ~pure function in terms of the api, which is about as good of a representation as it gets for a non-invertible mapping)
machine learning models are approximation functions, not pure functions. they are non-deterministic and non-ideal.
when i say "input space" i mean all possible combinations of valid tokens as inputs. when i say "output space" i mean all possible combinations of valid tokens as outputs that are valid continuations of the input sequence. that's massive combinatorials.
also, there's no api? most likely next output text is provided conditioned on being a continuation of the input text. it's probablistic inference. there is no api.
----
you're using a lot of software terms to try and explain yourself. don't do that. seriously. as someone who tried doing that in my PhD instead of actually learning the fundamentals -- learn the fundamentals of machine learning if you'd like to engage in these kinds of discussions.
it'll help you.