|
|
|
|
|
by ben_w
250 days ago
|
|
I don't think so. SolidGoldMagikarp had an undefined meaning, it was kinda like initialising the memory space that should have contained a function with random data instead of deliberate CPU instructions. Not literally like that, but kinda behaved like that: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldm... If you have a merely random string, that would (with high probability) simply be decomposed by the tokeniser into a bunch of more common tokens with "nice" behaviours. SolidGoldMagikarp etc. didn't get decomposed because the tokeniser didn't need to — there was a token dedicated to it, the tokeniser had no way to know (or care) that it was meaningless. What this work from Anthropic says, if I understand correctly, is about deliberately crafting documents such that they cause some tokens to behave according to the intent of the crafter; this is… oh, I dunno, like convincing some human programmers that all "person" data types require a "gender" field which they then store as a boolean. Or could be, at least, the actual example in the blog post is much bolder. |
|