Is that true? There are many attention/mlp layers stacked on top of each other. Higher level layers aren't performing attention on input tokens, but instead on the output of the previous layer.
Well, if you are referring to «The issue of transparency in current LLMs», I have not read an essay that explains satisfactorily the inner process and world modelling inside LLMs. Some pieces say (guess?) that the engine has no idea what the whole concept in the reply would be before outputting all the tokens, others swear it seems impossible it has no such idea before formulation...
Well, if you are referring to «The issue of transparency in current LLMs», I have not read an essay that explains satisfactorily the inner process and world modelling inside LLMs. Some pieces say (guess?) that the engine has no idea what the whole concept in the reply would be before outputting all the tokens, others swear it seems impossible it has no such idea before formulation...