Hacker News new | ask | show | jobs
by wojciem 872 days ago
Is it only me, or after reading this article with a lot of high-level, vague phrases and anecdotes - skipping the actual essence of many smart tricks making transformers computationally efficient - it is actually harder to grasp how transformers “really work”.

I recommend videos from Andrej Karpathy on this topic. Well delivered, clearly explaining main techniques and providing python implementation

4 comments

There's also this type of articles where the first half of the article is easily understandable by a layman but then they suddenly drop a lot of jargon and math formulas and you get completely lost.
A friend once described these kin of descriptions by analogy with a recipie that went;

Recipie for buns, First you need flour, this is a white fined grained powder that is produced from ground wheat that can be acquired by exchanging for money (a standardised convention for storing value) at a store which contains many such products. When mixed with the raising agent and other ingredients you should remove the buns from the oven when golden brown.

For this situation, if it feels worth it, I have been applying chatGPT Q&A on the jargon to bridge the gap. I haven’t read this article through yet, so can’t recommend, but in many cases it’s a super useful contextual jargon clearer.
Agreed, I have made my own shakespeare babbler following Karpathy's videos. I have a decent understanding of the structure and process but I don't really grasp how they work.

It's obvious how the error reduces, but I feel like there's something semanticly going on that isn't directly expressed in the code.

Im saving the latter half for tomorrow but so far its making sense. People have different learning styles, and I think this is lacking in the visual department. Parts like the vectors all being displayed next to the word like "cat", could have been better annotated to show where those numbers come from visually.
Super data science had a nice episode on this recently.