Hacker News new | ask | show | jobs
by ekunazanu 6 days ago
> This was openai’s entire breakthrough. Making this particular model architecture larger leads to emergent capabilities

Basically, the bitter lesson: https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...

3 comments

This interview https://youtu.be/oWOz2htozfI?si=qdQ0uZRoZOYeThOn from 2 days ago with a top researcher from OpenAI directly addresses the bitter lesson argument and the importance of scaling for the history of their models.
So the take-away here is that we (as humans) try to model these AIs like humans, but eventually these AIs get better. Which to me seems like a logical conclusion if they can do "things" (like "learning" or pattern matching) much faster than we can (the compute). Then language in LLMs is a bottleneck, the AI is constrained by the language, and thus if we want to scale further we could let AI create its own language (we would then have to translate whatever it creates back to a language we understand). It is the same for instance if we check the language of Inuit (people who live in the north and make temorary shelters like igloos in the snow) they have multiple words/verbs to describe the snow, while in English we only have one (?): snow. In English we don't need more words (we can explain snow state using multiple words) but for the Inuit language it makes sense to create these new terms (would also make it easier and faster to communicate). So in some sense, all languages are then "newspeak" to whatever a general language is what researches or AI might come up with. If this sounds dumb let me know, but if you know some research in this general language direction (I'd assume general AI research) would love to see it!
I think one hypothesis along these lines is that, if allowed, due to the limitations of human language you described, LLMs will gravitate towards "inventing" their own language (which, due to training pressures, may even resemble english from the outside, but contain deeper, "true", meaning within), but that we should do our best to prevent this even if it bottlenecks reasoning capabilities since it would cut off our ability to read its "true" thoughts and detect misalignment

See: https://openai.com/index/chain-of-thought-monitoring/

Quote below:

  Chain-of-thought (CoT) reasoning models “think” in natural language understandable by humans. Monitoring their “thinking” has allowed us to detect misbehavior such as subverting tests in coding tasks, deceiving users, or giving up when a problem is too hard.

  We believe that CoT monitoring may be one of few tools we will have to oversee superhuman models of the future.

  We have further found that directly optimizing the CoT to adhere to specific criteria (e.g. to not think about reward hacking) may boost performance in the short run; however, it does not eliminate all misbehavior and can cause a model to hide its intent. We hope future research will find ways to directly optimize CoTs without this drawback, but until then

  We recommend against applying strong optimization pressure directly to the CoTs of frontier reasoning models, leaving CoTs unrestricted for monitoring.

  We understand that leaving CoTs unrestricted may make them unfit to be shown to end-users, as they might violate some misuse policies. Still, if one wanted to show policy-compliant CoTs directly to users while avoiding putting strong supervision on them, one could use a separate model, such as a CoT summarizer or sanitizer, to accomplish that.
My understanding is that the Inuit snow claim is a bit of a myth. Beyond English words like slush, sleet, powder, hardpack, flurry, blizzard, etc you can also say "fluffy snow", "wet snow", etc. The Inuit language is just smooshing the adjective so you get something like "wetsnow" as one word.
Isn't the bitter lesson basically the same as "The Unreasonable Effectiveness of Data" from 2009?
not exactly, bitter lesson is one meta-level up from "scale eats everything". this is a common misunderstanding of bitter lesson that rich sutton has been fighting ever since the thing was written. in rich's own words[1], the modern summary is

> Don’t be distracted by human knowledge, as AI has been historically.

> Instead focus on methods for creating knowledge that scale with computation, like search and learning.

so the lesson is choose methods that scale with computation, not just that blindly scaling up anything (data, params, people, whatever) works, it is choosing the right x axis and the right scaling laws consistently wins out in the long run despite short term wins from other methods.

1: https://x.com/RichardSSutton/status/2056419165502935198