|
|
|
|
|
by nologic01
1158 days ago
|
|
Brute force approaches always hit some wall. ML will be no different. In the decades to come it us quite likely that algorithms will develop in directions orthogonal to current approaches. The idea that you improve performance by throwing gazillions of data into gargantuan models might be even come to be seen as laughable. Keep in mind (pun) that the only real intelligence here is us, and we are pretty good at figuring out when a tool has exhausted its utility. |
|
Somewhat counterintuitively, scaling datasets is the lazy and economical approach. If you have the compute already, might as well dig an OOM more text tokens.
But there are other sources of data, and slightly different ways to utilize it. Multimodality, in very large training runs, will almost inevitably increase sample efficiency (for obvious reasons of context richness), synthetic data is already very effective [1], and there are and will be discovered other ways to do more in the condition of diminishing raw text resources. But a thorough abandonment of the scaling strategy is very unlikely.
Sutton's Bitter Lesson [2] points at a very powerful rule of thumb: we shouldn't turn AI engineering into a contest of smartness, we should allow complex smartness to emerge from generic low-level algorithms. What will be seen as laughable in decades to come is not the scaling strategy, but the Godlike conceit of people who thought they can devise generally applicable rules of reasoning from first principles.
1: https://arxiv.org/abs/2304.08466 2: http://www.incompleteideas.net/IncIdeas/BitterLesson.html