Hacker News new | ask | show | jobs
by simonw 532 days ago
Andrej Karpathy: https://twitter.com/karpathy/status/1835024197506187617

> It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

> They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it".

1 comments

But I need enormous amounts of learning data and enormous amount of computing to learn new models, right? So it's kind of useless advice for most people who can't just parse github repositories and teach their new model using AST tokens. They have to use existing opensourced models or API and those happened to use text.