Y
Hacker News
new
|
ask
|
show
|
jobs
by
chronolitus
2137 days ago
and here's a breakdown of the architecture:
http://dugas.ch/artificial_curiosity/GPT_architecture.html
1 comments
odnes
2137 days ago
These 4 videos (~45 mins) do an excellent job at explaining attention, multi-headed attention, and transformers:
https://www.youtube.com/watch?v=yGTUuEx3GkA
link