|
|
|
|
|
by Lerc
8 days ago
|
|
I think you might have just discovered why Neural Nets need a non-linear element. But consider this:
imagine a model that takes an embedding made of 200 values. the first 100 encodes numbers the second encodes letters. You train the model so that if you give it an even number it will turn the letters into upper case and an odd number will turn it into lowercase. The numbers represent the prompt. The letters represent the non-prompt data. T What letter would you give it to make it think the number is odd. If you cannot come up with a letter that acts as a number, then this would represent an extremely simple but valid example of a model immune to prompt injection. |
|
The model you describe is not an LLM - you describe a model with a fixed context length and positional attenuation. Congratulations, the network as described no longer has a functioning attention mechanism which is one of the hallmarks of an LLM.