Hacker News new | ask | show | jobs
by bodyfour 1179 days ago
> There is no way around the bitter lesson.

Isn't there? I'm certainly not sure, based on the results published over the last weeks and months.

The giant GPT-{3.5,4} models show that if you make the model big enough and throw enough data at it you can produce an AI capable of conversing on basically any topic, in dozens of languages. There are plenty of different takes on how near-human its abilities are on specific tasks, but it's worth stepping back and appreciating how super-human the breadth of this knowledge is.

But it's also not clear if a mega-model is anything close to the most efficient way of storing knowledge. After all, you don't need to memorize every fact in Wikipedia if you know how to effectively search it.

And we're currently seeing a daily explosion in these capabilities. Today's flavor is interfacing with Wolfram, but we've also seen web searches, python coding, etc. That, I think, it the real superpower that comes out of this: you or I can answer a question by "doing a web search" or "query a database" or "use wolfram" or "develop a python program that finds the answer" However, an AI could do tasks like this just by "thinking" about it. Maybe it would be as natural as we find blinking.

That to me is the real breakthrough in stuff like Alpaca -- start with a mega-model and prompt it with something like: "After this paragraph, you are going to be speaking to a AI model similar to yourself but much more primitive. Its task will involve interfacing with English speakers, so converse with it only in that language. It has access to the same {X,Y,Z} APIs you have so any time it has trouble answering a question, prefer to give hints about how it could find the answer using those APIs rather than providing the answer directly yourself. Only give an answer directly if it repeatedly fails to be able to answer it by using an API. I've provided a large set of standardized tests used by humans at this URL -- start by asking it questions intended for a preschool-aged child. Each time it is able to answer new questions at a given level correctly 99% of the time increase the material's level until it is able to achieve that score on a test designed for a Computer Science PhD candidate"

How large would the "student" model have to be to succeed at this deep but narrower task? I think the answer right now "we have no idea". However if the model has the advantage that it can rely on external knowledge and tools from the start (and is rewarded by the "teacher" for doing just that) I bet it'll be a lot smaller than these mega-models. Sure, you wouldn't be able to disconnect the "student-AI" from its APIs and expect it to converse with you in Hungarian about the history of yacht design, but that might not be a capability it needs to have.

My personal hunch is that we're going to find these "AI-taught specialist AI, with API access" models will be a lot smaller than most people are expecting. That's the moment when things REALLY change: instead of pairing a human with a mega-model AI, if specialized models are cheap someone can say "spin up 100K expert-programmer AIs and have them supervized by 5K expert-manager AIs and have them build XYZ"

Or if you need it to work on an existing task you'd specialize further -- you'd go to your AI vendor and say "I'd like to license the weights for your expert-programmer model, but first have it read these 200 books I consider important to my problem domain and then show it every commit ever made by a human to my git repo and every design document I have"

2 comments

> you don't need to memorize every fact in Wikipedia if you know how to effectively search it.

yeah you're onto something. models good enough to sustain a conversation where I bring my own data as a primer are probably more useful that models that have a frozen knowledge of everything. the killer feature of gpt-4 is the 32k token size, which allows unprecedented amount of input to be fed into the knowledge graph and queried.

Very good analysis. I disagree with a fundamental point though: If you don't consider compute cost and just want the best possible AGI, then there's nothing stopping you from supercharging the mega-models with the same capabilities as the smaller models - and if the current scaling shows anything, the mega models will just become even better.
Sometimes you do need to consider compute cost, say if you want a small but high quality model that can run on a smart phone to perform a task. For example, with camera input, identify a plant or animal, while in a remote area with no cell signal, so it has to yield an answer without communicating with a server. What's the smallest, most efficient model that can do that effectively? Build that.
> If you don't consider compute cost [...]

Yes, but what if you do? Imagine your hyper-specialzied API-heavy model takes 10x less resources to answer a question (or at least a question relevant to the task at hand) Won't it be more powerful to have a model that can run 10 times as fast (or run 10 instances in parallel)?

What if the ratio turns out to be 100x or 1000x?

So I agree that the cutting edge of "best possible AGI" might mean building the largest models we can train on massive clusters of computers and then run on high-end hardware. My hunch, though, is that models that can be run on cheap hardware and then "swarmed" on a problem space will be even more powerful in what they can perform in aggregate.

Again, it's just my hunch but right now I think everybody's predictions are hunches.

I'll actually go one bit further: even for a linear task that can't be "swarmed" in the same way, it could be that cheaper-per-token models could even do better on linear problem-solving tasks. Existing models already have the ability to use randomness to give more "creative", if less reliable, answers. This is inherently parallelizable though -- in fact Bard seems to be exposing this in its UI in the form of multiple "drafts". So what if you just ran 100 copies of your cheap-AI against a problem and then had one cheap-AI (or maybe a medium-AI) judge the results?

Or at the risk of a getting too anthropomorphic about it: imagine you as a human are writing a program and you get stuck on a tricky bit -- you know that the problem should be solvable but you've never doing anything similar and don't know what algorithm to start with. Suppose then you could tell your brain "Temporarily fork off 100 copies of yourself. 10 of them go do a literature review of every CS paper you can find related to this topic. 10 of you search for open source programs that might have a similar need and try to determine how their code does it. The other 80 of you just stare off into the middle distance and try to think of a creative solution. In two human-seconds write a summary of your best idea and exit. I'll then read them all and see if I/we are closer to understanding what to do next"

For us, this type of mental process is so alien we can't even imagine what it would feel like to be able to do. It might come completely natural to an AI, though.