I have a BS in CS (and have been in the field for 25 years). I couldn't understand the transformer architecture until I built a few myself. Here are the books I worked through. I now feel I have a very good understanding of modern LLMs.
I went this route because I had difficulty visualizing the content of the Attention Is All You Need paper. After going through both books, I can now understand every part of that paper.
I'm currently working on a robotics project that uses Nvidia's GR00T N1 model, and I was able to understand the research paper. [0]