|
|
|
|
|
by charleshn
287 days ago
|
|
> We cannot add more compute to a given compute budget C without increasing data D to maintain the relationship.
> We must either (1) discover new architectures with different scaling laws, and/or (2) compute new synthetic data that can contribute to learning (akin to dreams). Of course we can, this is a non issue. See e.g. AlphaZero [0] that's 8 years old at this point, and any modern RL training using synthetic data, e.g. DeepSeek-R1-Zero [1]. [0] https://en.m.wikipedia.org/wiki/AlphaZero [1] https://arxiv.org/abs/2501.12948 |
|
Yes, distillation is a thing but that is more about compression and filtering. Distillation does not produce new data in the same way that chess games produce new positions.