| > There is no way around the bitter lesson. Isn't there? I'm certainly not sure, based on the results published over the last weeks and months. The giant GPT-{3.5,4} models show that if you make the model big enough and throw enough data at it you can produce an AI capable of conversing on basically any topic, in dozens of languages. There are plenty of different takes on how near-human its abilities are on specific tasks, but it's worth stepping back and appreciating how super-human the breadth of this knowledge is. But it's also not clear if a mega-model is anything close to the most efficient way of storing knowledge. After all, you don't need to memorize every fact in Wikipedia if you know how to effectively search it. And we're currently seeing a daily explosion in these capabilities. Today's flavor is interfacing with Wolfram, but we've also seen web searches, python coding, etc. That, I think, it the real superpower that comes out of this: you or I can answer a question by "doing a web search" or "query a database" or "use wolfram" or "develop a python program that finds the answer" However, an AI could do tasks like this just by "thinking" about it. Maybe it would be as natural as we find blinking. That to me is the real breakthrough in stuff like Alpaca -- start with a mega-model and prompt it with something like: "After this paragraph, you are going to be speaking to a AI model similar to yourself but much more primitive. Its task will involve interfacing with English speakers, so converse with it only in that language. It has access to the same {X,Y,Z} APIs you have so any time it has trouble answering a question, prefer to give hints about how it could find the answer using those APIs rather than providing the answer directly yourself. Only give an answer directly if it repeatedly fails to be able to answer it by using an API. I've provided a large set of standardized tests used by humans at this URL -- start by asking it questions intended for a preschool-aged child. Each time it is able to answer new questions at a given level correctly 99% of the time increase the material's level until it is able to achieve that score on a test designed for a Computer Science PhD candidate" How large would the "student" model have to be to succeed at this deep but narrower task? I think the answer right now "we have no idea". However if the model has the advantage that it can rely on external knowledge and tools from the start (and is rewarded by the "teacher" for doing just that) I bet it'll be a lot smaller than these mega-models. Sure, you wouldn't be able to disconnect the "student-AI" from its APIs and expect it to converse with you in Hungarian about the history of yacht design, but that might not be a capability it needs to have. My personal hunch is that we're going to find these "AI-taught specialist AI, with API access" models will be a lot smaller than most people are expecting. That's the moment when things REALLY change: instead of pairing a human with a mega-model AI, if specialized models are cheap someone can say "spin up 100K expert-programmer AIs and have them supervized by 5K expert-manager AIs and have them build XYZ" Or if you need it to work on an existing task you'd specialize further -- you'd go to your AI vendor and say "I'd like to license the weights for your expert-programmer model, but first have it read these 200 books I consider important to my problem domain and then show it every commit ever made by a human to my git repo and every design document I have" |
yeah you're onto something. models good enough to sustain a conversation where I bring my own data as a primer are probably more useful that models that have a frozen knowledge of everything. the killer feature of gpt-4 is the 32k token size, which allows unprecedented amount of input to be fed into the knowledge graph and queried.