Hacker News new | ask | show | jobs
by noodletheworld 475 days ago
While I appreciate the pictures, really at the end of the day all you have is a glossary and slightly more detailed arbitrary hand waving.

What specific architecture is used to build a basic model?

Why is that specific combination of basic building blocks used?

Why does it work when other similar ones don’t?

I generally approve of simplifications, but these LLM simplifications are too vague and broad to be useful or meaningful.

Here my challenge: take that article and write an LLM.

No?

How about an article on raytracing?

Anyone can do a raytracer in a weekend.

Why is building an LLM miles of explanation of concepts and nothing concrete you can actually build?

Where’s my “LLM in a weekend” that covers the theory and how to actually implement one?

The distinction between this and something like https://github.com/rasbt/LLMs-from-scratch is stark.

My hot take is, if you haven’t built one, you don’t actually understand how they work, you just have a kind of vague kind-of-heard of it understanding, which is not the same thing.

…maybe that’s harsh, and unfair. I’ll take it, maybe it is; but I’ve seen a lot of LLM explanations that conveniently stop before they get to the hard part of “and how do you actually do it?”, and another one? Eh.