|
|
|
|
|
by attemptone
1205 days ago
|
|
Not OP, but I see the also in problem with 'every possible entity'. If you formulate it like that the prompt is decoupled from the LLM capabilities and can be anything. And if you restrict the prompt to cover only what the LLM understands the sentence becomes trivial. Train a LLM with ASCII and try to get it to simulate anything that is outside of that (ancient sumerian script for example).
If you only input ASCII it can generate every possible output in ASCII, most with very low probability but still. After writing this, I'm not even sure what 'simulating' means in this context. |
|
For example, the string "1010101010"... could be the output of a function
It could also be the output of this function: Even if it's not explicitly running those two functions, a model that is very good at predicting the next character of this input string might have, embedded within it, analogues of both of those two functions. The longer the output continues to follow the "101010" pattern, the higher confidence it should place on the _alternating version. On the other hand, if it encounters a "...110001..." sequence, it should switch to placing much more confidence on the _random version.The LLM of course does not contain an infinite list of generative functions and weight their outputs. But to the extent that it works well and compactly approximates Bayesian reasoning, it should approximate a program that does.