| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by navjack27 1242 days ago

The creators know exactly how it works. I believe we have a really good understanding on how large language models work. As you feed the model more information (training) its ability to predict (inference) what should come next after the tokens (words or prompt) it was fed becomes more "accurate". And it's only more accurate compared to things it was previously trained on. Bad data in bad data out. And that's just putting it very simply.

You could even steer a language model to give more customized type results without retraining it. If it's a good enough model like gpt-j you could picture it like this...

You could have a text input field where the user inputs whatever their prompt is and when the user presses submit what happens in the background is that you concatenate a starter prompt to the beginning of what the user input was. Then when you get your results back you just filter out that engineered part of the prompt and you format the users input prompt so it stands out as it's the original prompt and then you format the result so that stands out as what was just computed. Doing that you can basically prime a language model with a whole bunch of background information that's already there.