| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LatencyKills 53 days ago

I have a BS in CS (and have been in the field for 25 years). I couldn't understand the transformer architecture until I built a few myself. Here are the books I worked through. I now feel I have a very good understanding of modern LLMs.

https://www.amazon.com/Build-Large-Language-Model-Scratch/dp...

https://www.amazon.com/Build-DeepSeek-Scratch-Abhijit-Dandek...

1 comments

tinktank 52 days ago

Has it given you enough of an understanding that you can pick up and follow research papers or did you have to do more to achieve that?

link

LatencyKills 52 days ago

I went this route because I had difficulty visualizing the content of the Attention Is All You Need paper. After going through both books, I can now understand every part of that paper.

I'm currently working on a robotics project that uses Nvidia's GR00T N1 model, and I was able to understand the research paper. [0]

[0]: https://arxiv.org/abs/2503.14734

link

tinktank 51 days ago

Thank you for the information.

link