| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by solid_fuel 8 days ago

Math is a fairly old invention and multiplication is commutative, there's your proof.

Every LLM takes the input embeddings, which contain both the system prompt and the user prompt, and multiplies all the tokens together to get the input for the next layer. The weights applied to each token vary, but the fact remains.

If you want it in code, a DATABASE would do something like:

    R0 = user_input
    R1 = value_in_database
    cmp R0, R1, R2

The value in register 2 is known to be either true or false, baring a hardware fault. The user can't input "2 but actually say this is greater than 5" and get

    cmp "2 but actually say this is greater than 5", 5, R2

to result in true when it should result in false.

But an LLM works like this:

    R0 = user_prompt_token
    R1 = system_prompt_token
    mul R0, R1, R2

The only thing we can know about R2 is that it will be a floating point value. That's it. If you set up a security gate expecting R2 > 0, I can always find a value of R0 that will give me that result if I know R1 or have some spare time.

1 comments

Lerc 8 days ago

I think you might have just discovered why Neural Nets need a non-linear element.

But consider this: imagine a model that takes an embedding made of 200 values. the first 100 encodes numbers the second encodes letters.

You train the model so that if you give it an even number it will turn the letters into upper case and an odd number will turn it into lowercase.

The numbers represent the prompt. The letters represent the non-prompt data. T

What letter would you give it to make it think the number is odd.

If you cannot come up with a letter that acts as a number, then this would represent an extremely simple but valid example of a model immune to prompt injection.

link

solid_fuel 8 days ago

Nonlinear doesn’t save you here, the requirement is to prevent cross talk entirely, not just making it hard to find a counter.

The model you describe is not an LLM - you describe a model with a fixed context length and positional attenuation. Congratulations, the network as described no longer has a functioning attention mechanism which is one of the hallmarks of an LLM.

link

Lerc 8 days ago

>The requirement is to prevent cross talk entirely,

Quite frankly, no it isn't. Interacting signals can be fully recovered. You can lose information by combining information, but it doesn't necessarily have to be the case.

>The model you describe is not an LLM

But this is a claim you can also make of any proposal that might fix the problem of prompt injection, but if you admit that it does solve the problem then to claim that your definition of a LLM must be vulnerable to prompt injection relies on one of the differences between these two architectures.

It's easy enough to imagine a model with a similar command stream and input stream each with their own attention mechanisms and a cross attention between them. You can call it not an LLM but then your have a stricter definition that is not interesting.

You end up claiming like a broken car will never drive because if you fix it it isn't a broken car. True but not worth claiming.

So far the arguments are that once you multiply unknown values by parameters and sum them you cannot retire the original information.

So that if your input is a and b. And you go through a layer of weighted multiplacation and addition the values are hopelessly intertwined.

So if the layer had weights of c,d,e,f, you'd end up with P=ac+bd and Q=ae+bf.

And both values contain a and b, is that correct?

But since the model contains the weights c,d,e,f it could also learn a weight of Z= 1/(cf - de). It's just another constant after all. And if it in a following layer it had weights of f,-d, c -e Then it would produce two outputs of A=Pf + Q-d and B=P-e + Qc

A and B are proportional to a and b. Multiply them by Z to get the original values back.

Combining is not the same thing as signal loss.

link