Hacker News new | ask | show | jobs
by imtringued 1229 days ago
It is a transformer model which means it has layers for decoding and encoding information.

This means you can ask it to translate from one representation to another. You can write a sentence and turn it into an equivalent SQL query or a poem, for instance.

But this means whenever you are asking chatgpt to do something for you, it basically tries to decode your question or order and encode its answer representation.

When people ask it to write a program or command it can turn it into its help text representation which then looks like a believable command that can be executed. If you ask it to execute the code, it will try to find a representation that mirrors the output of the program.

At least that is how I imagine it works.

2 comments

That's not what a transformer model is: a transformer model is just one that uses self-attention blocks in its layers to encode contextual information about the input. A non-transformer model can equally translate from one representation to another: e.g. before transformer models a commonly used architecture for seq2seq models were RNNs.
Lol, lots of people spouting off about how they imagine AI works these days. This is not an accurate description of the GPT2/3 model architectures.
It would be a lot more helpful if you could explain the difference. What’s wrong with that description, it seems pretty close to the descriptions of how it works that I’ve seen so far.