Hacker News new | ask | show | jobs
by sakesun 489 days ago
I've found none of the explanations of how LLMs are built have been satisfying, especially considering how impressive the applications of them are.
3 comments

Karpathy's recent video[1] is quite good.

1. Deep Dive into LLMs like ChatGPT (https://youtube.com/watch?v=7xTGNNLPyMI)

It's a good introduction at the pop-sci level, but most technically-inclined HN'ers are probably going to get more benefit from his earlier Zero to Hero series.
Many thanks for this link.
To get the rough idea how things work, you can watch the Karpathy's series on YouTube. To get the actual understanding how things work, you will have to read through the papers. You probably want both. Finally, to really understand how it works then there's no better way than implementing an inference engine yourself. All other material I also found to be superficial and not satisfactory ... too much information at the hand-waving level.
Curious what's your questions that's really unanswered?
I'm still amazed by how intelligent the outcome is, after these number crunching processes. Really cannot relate its ability to generalize information to the theory behind it.