| While I appreciate the pictures, really at the end of the day all you have is a glossary and slightly more detailed arbitrary hand waving. What specific architecture is used to build a basic model? Why is that specific combination of basic building blocks used? Why does it work when other similar ones don’t? I generally approve of simplifications, but these LLM simplifications are too vague and broad to be useful or meaningful. Here my challenge: take that article and write an LLM. No? How about an article on raytracing? Anyone can do a raytracer in a weekend. Why is building an LLM miles of explanation of concepts and nothing concrete you can actually build? Where’s my “LLM in a weekend” that covers the theory and how to actually implement one? The distinction between this and something like https://github.com/rasbt/LLMs-from-scratch is stark. My hot take is, if you haven’t built one, you don’t actually understand how they work, you just have a kind of vague kind-of-heard of it understanding, which is not the same thing. …maybe that’s harsh, and unfair. I’ll take it, maybe it is; but I’ve seen a lot of LLM explanations that conveniently stop before they get to the hard part of “and how do you actually do it?”, and another one? Eh. |