Hacker News new | ask | show | jobs
by mrfusion 2059 days ago
Every time I research transformers it seems so hand wavy. Is there a simple description, maybe a bit of pseudo code?

Or at the other extreme they dump me into formula land without exposing what all the letters in the formula represent.

2 comments

This is quite a good explanation of transformers that gets shared a lot. [link](http://jalammar.github.io/illustrated-transformer/)

And here's a super simple implementation of GPT by Andrej Karpathy. [link](https://github.com/karpathy/minGPT/blob/master/mingpt/model....)

Transformers are kinda similar to state vectors. They are tracking the current state of the world. The input becomes the output which is the input to the next iteration. The transformer transform the input to the output ad infinitum until a stop token is reached.