Hacker News new | ask | show | jobs
by Lerc 8 days ago
Ok in the SQL example imagine if you had a SQL engine that issued commands encoded in ASCII in the high byte of 16 bit characters, and all non-command data as ASCII in the low byte of 16 bit characters.

If user input can only be in the low byte, it cannot influence the command structure.

A similar thing could be done with embeddings, a provenance embedding that cannot be set by user input could serve a similar role.

>You cannot separate data that was input by the user and data that is from the system once it is mixed together like that.

You can train a model to not mix things, many models are trained to separate things. A neural net with X and Y outputs for a position does not just occasionally decide to flip the outputs. Sure it could be trained to reverse the output, but it is also easy to train something to the point that you have a high confidence to never do that.

1 comments

> Ok in the SQL example imagine if you had a SQL engine that issued commands encoded in ASCII in the high byte of 16 bit characters, and all non-command data as ASCII in the low byte of 16 bit characters.

> If user input can only be in the low byte, it cannot influence the command structure.

> A similar thing could be done with embeddings, a provenance embedding that cannot be set by user input could serve a similar role.

A similar thing cannot be done with embeddings. You are lacking a fundamental understanding of the issue. The only reason that you can separate user and command data in SQL queries is because the command data is used to command a deterministic machine which then uses the user data as inputs to carefully constructed operations like comparisons.

This is not how LLMs operate. There is no deterministic machinery executing a system prompt against user data, there is only a single array of tensors which get fed into a giant block of linear algebra and multiplied together.

> You can train a model to not mix things, many models are trained to separate things.

That is not applicable to this, because segmentation models are not the same thing as LLMs. They have different architectures.

> A neural net with X and Y outputs for a position does not just occasionally decide to flip the outputs.

Not even close to the same thing, to the point where this is irrelevant.

Feel free to prove me wrong, github links welcome below.

You misunderstand the challenge you face.

I know what models do at the moment, and I don't know of any doing this approach at the moment, but I don't need to. I don't need to show that this mechanism works. Your claim that the problem is intractable means it is incumbent upon you to show that it won't work.

I provided this particular example to show a way to modify a LLM architecture that may address the problem.

>there is only a single array of tensors which get fed into a giant block of linear algebra and multiplied together.

For starters, that's wrong. If you don't know why an how to make things non-linear then you might not have the understanding that you think you do.

>> You can train a model to not mix things, many models are trained to separate things.

>That is not applicable to this, because segmentation models are not the same thing as LLMs. They have different architectures.

I used that particular example because you said "You cannot separate data that was input by the user and data that is from the system once it is mixed together like that" and that simply is not true. LLMs can do what neural nets do because they contain them, neuralnets can perform functions. If there is any signal distinguishing two things then there is a function that can separate them.

Not knowing how to do this does not mean it cannot be done. An inadequate description of a transformer certainly does not do it.

> I used that particular example because you said "You cannot separate data that was input by the user and data that is from the system once it is mixed together like that" and that simply is not true. LLMs can do what neural nets do because they contain them, neuralnets can perform functions. If there is any signal distinguishing two things then there is a function that can separate them.

Oh my, this is a serious misunderstanding on your part. That segmentation models can classify portions of an input into separate groups has no bearing on being able to unmix user and system intent within the confines of an LLM.

Just one of many issues with your reasoning here: a segmentation model works along boundaries in the data. E.g. in simple terms, a foreground segmentation model works because you can define a clear foreground and background for most images. There is no way to differentiate system and user intent in the same way, they aren’t segmentable in the same way as an image.