Hacker News new | ask | show | jobs
by gchadwick 234 days ago
Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

2 comments

I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.

What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works (sentence written by a human despite use of "delve"). Everyone should have some notions on what LLMs can or cannot do, in order to use them successfully and not be misguided by their limitations, but we don't need everyone to understand what backpropagation is, just as most of us use cars without knowing much about how an internal combustion engine works.

And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.

> What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works

The question here is whether the details are important for the major issues, or whether they can be abstracted away with a vague understanding. To what extent abstracting away is okay depends greatly on the individual case. Abstractions can work over a large area or for a long time, but then suddenly collapse and fail.

The calculator, which has always delivered sufficiently accurate results, can produce nonsense when one approaches the limits of its numerical representation or combines numbers with very different levels of precision. This can be seen, for example, when one rearranges commutative operations; due to rounding problems, it suddenly delivers completely different results.

The 2008 financial crisis was based, among other things, on models that treated certain market risks as independent of one another. Risk could then be spread by splitting and recombining portfolios. However, this only worked as long as the interdependence of the different portfolios was actually quite small. An entire industry, with the exception of a few astute individuals, had abstracted away this interdependence, acted on this basis, and ultimately failed.

As individuals, however, we are completely dependent on these abstractions. Our entire lives are permeated by things whose functioning we simply have to rely on without truly understanding them. Ultimately, it is the nature of modern, specialized societies that this process continues and becomes even more differentiated.

But somewhere there should be people who work at the limits of detailed abstractions and are concerned with researching and evaluating the real complexity hidden behind them, and thus correcting the abstraction if necessary, sending this new knowledge upstream.

The role of an expert is to operate with less abstraction and more detail in her oder his field of expertise than a non-expert -- and the more so, the better an expert she or he is.

Because if you don't understand how a tool works you can't use the tool to it's full potential.

Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"

If you want a good idea of how well LLMs will work for your use case then use them. Use them in different ways, for different things.

Knowledge of backprop no matter how precise, and any convoluted 'theories' will not make you utilize LLMs any better. You'll be worse off if anything.

Yeah, that's what I'm trying to explain (maybe unsuccessfully). I do know backprop, I studied and used it back in the early 00s when it was very much not cool. But I don't think that knowledge is especially useful to use LLMs.

We don't even have a complete explanation of how we go from backprop to the emerging abilities we use and love, so who cares (for that purpose) how backprop works? It's not like we're actually using it to explain anything.

As I say in another comment, I often give talks to laypeople about LLMs and the mental model I present is something like supercharged Markov chain + massive training data + continuous vocabulary space + instruction tuning/RLHF. I think that provides the right abstraction level to reason about what LLMs can do and what their limitations are. It's irrelevant how the supercharged Markov chain works, in fact it's plausible that in the future one could replace backprop with some other learning algorithm and LLMs could still work in essentially the same way.

In the line of your first paragraph, probably many teens who had a lot of time in their hands when Bing Chat was released, and some critical spirit to not get misled by the VS, have better intuition about what an LLM can do than many ML experts.

I disagree in the case of LLMs, because they really are an accidental side effect of another tool. Not understanding the inner workings will make users attribute false properties to them. Once you understand how they work (how they generate plausible text), you get a far deeper grasp on their capabilities and how to tweak and prompt them.

And in fact this is true of any tool, you don’t have to know exactly how to build them but any craftsman has a good understanding how the tool works internally. LLMs are not a screw or a pen, they are more akin to an engine, you have to know their subtleties if you build a car. And even screws have to be understood structurally in advanced usage. Not understanding the tool is maybe true only for hobbyists.

Could you provide an example of an advanced prompt technique or approach that one would be much more likely to employ if they had knowledge of X internal working?
You hit the nail on the head, in my opinion.

There are things that you just can’t expect from current LLMs that people routinely expect from them.

They start out projects with those expectations. And that’s fine. But they don’t always learn from the outcomes of those projects.

I don't think that's a good analogy, becuase if you're trying to train a single layer perceptron to approximate XOR you're not the end user.
None of this is about an end user in the sense of the user of an LLM. This is aimed at the prospective user of a training framework which implements backpropagation at a high level of abstraction. As such it draws attention to training problems which arise inside the black box in order to motivate learning what is inside that box. There aren't any ML engineers who shouldn't know all about single layer perceptrons I think, and that makes for a nice analogy to real life issues in using SGD and backpropagation for ML training.
The post I was replying to was about "colleagues, who are extremely invested in capabilities of LLMs" and then mentions how they are uninterested in how they work and just interested in what they can do and societal implications.

It sounds to me very much like end users, not people who are training LLMs.

The analogy is if you don't understand the limitations of the tool you may try and make it do something it is bad at and never understand why it will never do the thing you want despite looking like it potentially coild
I think there are a lot of people who just don't care about stuff like nanochat because it's exclusively pedagogical, and a lot of people want to learn by building something cool, not taking a ride on a kiddie bike with training wheels.
That's fine as far as it goes, but there is a middle ground ...

Feynman was right that "If you can't build it, you don't understand it", but of course not everyone needs or wants to fully understand how an LLM works. However, regarding an LLM as a magic black box seems a bit extreme if you are a technologist and hope to understand where the technology is heading.

I guess we are in an era of vibe-coded disposable "fast tech" (cf fast fashion), so maybe it only matters what can it do today, if playing with or applying towards this end it is all you care about, but this seems a rather blinkered view.

The problem is that not even the ones building them can understand it. Otherwise they wouldn't be breaking their "I promise AGI by the end of next year" promises for the Nth time.

That or they are flat out lying. My money's on the latter.

If everyone had to understand how carburettors, engines and break systems work; to be able to drive a car - rather than just learn to drive and get from A to B - I'm guessing there would be a lot less cars on the road.

(Thinking about it, would that necessarily be a bad thing...)

The problem is that we have a huge swathe of "mechanics" that basically don't know much more than how to open a paintcan and paint a pig despite promising to deliver finely tuned supercars with their magic car making machine.
I'm personally very interested in how LLMs work under the hood, but I don't think everyone who uses them as tools needs that. I don't know the wiring inside my drill, but I know how to put a hole in my wall and not my hand regardless.
The thing is that if you actually learn how they work, they lose all of their magic. This has happened to anyone I know who bothered studying them and is not selling them. So I'd rather people learned. Knowing how a drill works doesn't make you any less likely to use a drill.
Not everybody who drives a car (even as a professional driver) knows how to make one.

If you live in a world of horse carriages, you can be thinking about what the world of cars is going to be like, even if you don't fully understand what fuel mix is the most efficient or what material one should use for a piston in a four-stroke.

Do you go deep into molecular biology to see how it works , it is much more interesting and important
But the question is if you have a better understanding of LLMs from a user's perspective, or they.
Obviously they are more focused on making something that works
Wow. Definitely NOT management material then.
Which is terrible. That's the root of all the BS around LLMs. People lacking understanding of what they are and ascribing capabilities which LLMs just don't have, by design. Even HN discussions are full of that. Even though this page literally has "hacker" in its name.
I see your point but on the other hand a lot of conversations go: A: what will we do when AI do all the jobs, B: that's silly LLMs can't do the jobs. The thing is A didn't say LLM, they said AI as in whatever that will be a short while into the future. Which is changing rapidly because thousands of bright people are being paid to change it.
> a short while into the future

And what gives you that confidence? A few AI nerds already claimed that in the 80s.

We're currently exploring what LLMs can do. There is no indication that any further fundamental breakthrough is around the corner. Everybody is currently squeezing the same stone.

The trouble is that "AI" is also very much a leaky abstraction, which makes it tempting to see all the "AI" advances of recent years, then correctly predict that these "AI" advances will continue, but then jump to all sorts of wrong conclusions about what those advances will be.

For example, things like "AI" image and video generation are amazing, as are things like AlphaGo and AlphaFold, but none of these have anything to do with LLMs, and the only technology they share with LLMs is machine learning and neural nets. If you lump these together with LLMs, calling them all "AI", then you'll come to the wrong conclusion that all of these non-LLM advances indicate that "AI" is rapidly advancing and therefore LLMs (also being "AI") will do too ...

Even if you leave aside things like AlphaGo, and just focus on LLMs, and other future technology that may take all our jobs, then using terms like "AI" and "AGI" are still confusing and misleading. It's easy to fall into the mindset that "AGI" is just better "AI", and that since LLMs are "AI", AGI is just better LLMs, and is around the corner because "AI" is advancing rapidly ...

In reality LLMs are, like AlphaFold, something highly specific - they are auto-regressive next-word predictor language models (just as a statement of fact, and how they are trained, not a put-down), based on the Transformer architecture.

The technology that could replace humans for most jobs in the future isn't going to be a better language model - a better auto-regressive next-word predictor - but will need to be something much more brain like. The architecture itself doesn't have to be brain-like, but in order to deliver brain-like functionality it will probably need to include another half-dozen "Transformer-level" architectural/algorithmic breakthroughs including things like continual learning, which will likely turn the whole current LLM training and deployment paradigm on it's head.

Again, just focusing on LLMs, and LLM-based agents, regarding them as a black-box technology, it's easy to be misled into thinking that advances in capability are broadly advancing, and will rise all ships, when in reality progress is much more narrow. Headlines about LLMs achievement in math and competitive programming, touted as evidence of reasoning, do NOT imply that LLM reasoning is broadly advancing, but you need to get under the hood and understand RL training goals to realize why that is not necessarily the case. The correctness of most business and real-world reasoning is not as easy to check as is marking a math problem as correct or not, yet that capability is what RL training depends on.

I could go on .. LLM-based agents are also blurring the lines of what "AI" can do, and again if treated as a black box will also misinform as to what is actually progressing and what is not. Thousands of bright people are indeed working on improving LLM-adjacent low-hanging fruit like this, but it'd be illogical to conclude that this is somehow helping to create next-generation brain-like architectures that will take away our jobs.

I'll give you algorithmic breakthroughs have been quite slow to come about - I think backpropagation in 1986 and then transformers in 2017. Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like. I recently read how LLMs work and was surprised how text focused it is, making word vectors and not physical understanding.
> Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like

That's because you, as you admit in the next sentence, have almost no understanding of how they work.

Your reasoning is on the same level as someone in the 1950s thinking ubiquitous flying cars are just a few years away. Or fusion power, for that matter.

In your defense, that seems to be about the average level of engagement with this technology, even on this website.

> Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like.

That's like saying, well, given how fast bicycles make us, so much closer to horse speed, I wonder if we can tweak this a little to move faster than any animal can run. But cars needed more technological breakthroughs, even though some aspects of them used insights gained from tweaking bicycles.

Yes, it's a bit shocking to realize that all LLMs are doing is predicting next word (token) from samples in the training data, but the Transformer is powerful enough to do a fantastic job of prediction (which you can think of as selecting which training sample(s) to copy from), which is why the LLM - just a dumb function - appears as smart as the human training data it is copying.

The Math Olympiad results are impressive, but at the end of the day is just this same next word prediction, but in this case fine tuned by additional LLM training on solutions to math problems, teaching the LLM which next word predictions (i.e. output) will add up to solution steps that lead to correct problem solutions in the training data. Due to the logical nature of math, the reasoning/solution steps that worked for training data problems will often work for new problems it is then tested on (Math Olympiad), but most reasoning outside of logical domains like math and programming isn't so clear cut, so this approach of training on reasoning examples isn't necessarily going to help LLMs get better at reasoning on more useful real-world problems.

I’m trying not to be disappointed by people, I’d rather understand what’s going on in their minds, and how to navigate that.
And to all the LLM heads here, this is his work process:

> Yesterday I was browsing for a Deep Q Learning implementation in TensorFlow (to see how others deal with computing the numpy equivalent of Q[:, a], where a is an integer vector — turns out this trivial operation is not supported in TF). Anyway, I searched “dqn tensorflow”, clicked the first link, and found the core code. Here is an excerpt:

Notice how it's "browse" and "search" not just "I asked chatgpt". Notice how it made him notice a bug

First of all, this is not a competition between “are LLMs better than search”.

Secondly, the article is from 2016, ChatGPT didn’t exist back then

I doubt he's letting LLM creep in to his decision-making in 2025, aside from fun side projects (vibes). We don't ever come across Karpathy going to an LLM or expressing that an LLM helped in any of his Youtube videos about building LLMs.

He's just test driving LLMs, nothing more.

Nobody's asking this core question in podcasts. "How much and how exactly are you using LLMs in your daily flow?"

I'm guessing it's like actors not wanting to watch their own movies.

Karpathy talking for 2 hours about how he uses LLMs:

https://www.youtube.com/watch?v=EWvNQjAaOHw

Vibing, not firing at his ML problems.

He's doing a capability check in this video (for the general audience, which is good of course), not attacking a hard problem in ML domain.

Despite this tweet: https://x.com/karpathy/status/1964020416139448359 , I've never seen him citing an LLM helped him out in ML work.

You're free to believe whatever fantasy you wish, but as someone who frequently consults an LLM alongside other resources when thinking about complex and abstract problems, there is no way in hell that Karpathy intentionally limits his options by excluding LLMs when seeking knowledge or understanding.

If he did not believe in the capability of these models, he would be doing something else with his time.

> Continuing the journey of optimal LLM-assisted coding experience. In particular, I find that instead of narrowing in on a perfect one thing my usage is increasingly diversifying

https://x.com/karpathy/status/1959703967694545296

what you did here is called confirmation bias.

> I think congrats again to OpenAI for cooking with GPT-5 Pro. This is the third time I've struggled on something complex/gnarly for an hour on and off with CC, then 5 Pro goes off for 10 minutes and comes back with code that works out of the box. I had CC read the 5 Pro version and it wrote up 2 paragraphs admiring it (very wholesome). If you're not giving it your hardest problems you're probably missing out.

https://x.com/karpathy/status/1964020416139448359

Yes, embedding .py code inside of a speedrun.sh to "simplify the [sic] bash scripts."

Eureka runs LLM101n, which is teaching software for pedagogic symbiosis.

[1]:https://eurekalabs.ai/