| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wojciem 872 days ago
	Is it only me, or after reading this article with a lot of high-level, vague phrases and anecdotes - skipping the actual essence of many smart tricks making transformers computationally efficient - it is actually harder to grasp how transformers “really work”. I recommend videos from Andrej Karpathy on this topic. Well delivered, clearly explaining main techniques and providing python implementation

4 comments

kgeist 872 days ago

There's also this type of articles where the first half of the article is easily understandable by a layman but then they suddenly drop a lot of jargon and math formulas and you get completely lost.

link

Lerc 872 days ago

A friend once described these kin of descriptions by analogy with a recipie that went;

Recipie for buns, First you need flour, this is a white fined grained powder that is produced from ground wheat that can be acquired by exchanging for money (a standardised convention for storing value) at a store which contains many such products. When mixed with the raising agent and other ingredients you should remove the buns from the oven when golden brown.

link

jeremiahbuckley 871 days ago

For this situation, if it feels worth it, I have been applying chatGPT Q&A on the jargon to bridge the gap. I haven’t read this article through yet, so can’t recommend, but in many cases it’s a super useful contextual jargon clearer.

link

Lerc 872 days ago

Agreed, I have made my own shakespeare babbler following Karpathy's videos. I have a decent understanding of the structure and process but I don't really grasp how they work.

It's obvious how the error reduces, but I feel like there's something semanticly going on that isn't directly expressed in the code.

link

Geisterde 871 days ago

Im saving the latter half for tomorrow but so far its making sense. People have different learning styles, and I think this is lacking in the visual department. Parts like the vectors all being displayed next to the word like "cat", could have been better annotated to show where those numbers come from visually.

link

3abiton 871 days ago

Super data science had a nice episode on this recently.

link