Hacker News new | ask | show | jobs
by godelski 1081 days ago
> The goal at the end is to have a deep understanding of the LLM space and its adjacency.

This is kinda a hard thing to quantify. How are we defining deep? Like you want to understand how they work? The Karpathy videos are good for that. But I wouldn't call this "deep".

If you want to get down into the weeds and into the mud, you need a hell of a lot more than 13hrs of education. You're also going to have a hard time doing this because most people are going from an engineering perspective of "enough to work with it" rather than "I fundamentally want to understand all inner workings". If you are the former, then the fastai course and others are great for you. If you want to really get deep though, you're going to need a lot more than programming. You're going to need some pretty advanced maths too: high dimensional statistics, metric theory, and optimization theory are some. (Most researchers aren't doing this btw) But if you do go down this path you'll also be able to understand the full spectrum of generative models and have a clearer picture. But I should also say that there is still a black box element to these models as they are so large that they are near impossible to analyze. But it is definitely achievable to learn a 2 layer transformer autoregressive network and fully understand its inner workings. But programming skills alone won't get you there.

1 comments

Thanks for the helpful advice. What would you recommend to someone who is interested in learning about diffusion models? I have a CS degree but I have 0 knowledge about AI. Things like Stable Diffusion have blown my mind and I’m really interested in learning about this field. Lots of courses out there but I lack the expertise to discern which one is good.
Yeah no problem, this is even closer to my area of focus! What do you know about physics and thermodynamics?

I'd say a good intro for low background is from Tomczak[0]. He has a book, but the blog posts are nearly identical. He did a post doc with Max Welling (someone you should learn about if you want to get deep, like I was suggesting before). So I'd switch things up slightly. I'd go Intro -> Autoregressive -> Flow -> VAE -> Hierarchical VAE -> Energy Based Models -> Diffusion. It is worth learning about GANs btw, but this progression should be natural and build up.

Continuing from there, you're going to want to learn about things Langevin Dynamics, Score Matching, and so on. Start with Yang Song's blogs[1]. Your goal should be to understand this paper[2]. Once you get there, you should be able to understand the famous DDPM paper[3]. But why we went through Tomczak wasn't just to get a good understanding of diffusion at a deeper level, but because you need these tools to understand Stable Diffusion which really is just Latent Diffusion[4]. This should connect back with Tomczak's 2 Improving VAE papers and you should also be able to understand NVAE.

This is probably the quickest way to get you to a good understanding but if you want to dig deeper, which I highly encourage (because there are major issues that people aren't discussing) then you'll need more time. But you'll probably have to tools to do so if you go through this route. Other people I suggest looking into: Diederik Kingma, Ruiqi Gao, Stefano Ermon, Jonathan Ho, Ricky T. Q. Chen, and Arash Vahdat.

[0] https://jmtomczak.github.io/

[1] https://yang-song.net/

[2] Deep Unsupervised Learning using Nonequilibrium Thermodynamics https://arxiv.org/abs/1503.03585

[3] https://arxiv.org/abs/2006.11239

[4] High-Resolution Image Synthesis with Latent Diffusion Models https://arxiv.org/abs/2112.10752

The quality of these recommendations reflects favourably on OP.