|
|
|
|
|
by wojciem
872 days ago
|
|
Is it only me, or after reading this article with a lot of high-level, vague phrases and anecdotes - skipping the actual essence of many smart tricks making transformers computationally efficient - it is actually harder to grasp how transformers “really work”. I recommend videos from Andrej Karpathy on this topic. Well delivered, clearly explaining main techniques and providing python implementation |
|