|
|
|
|
|
by TheCoreh
411 days ago
|
|
This is perhaps why it took us this long to get to LLMs, the underlying math and ideas were (mostly) there, and even if the Transformer as an architecture wasn't ready yet, it wouldn't surprise me if throwing sufficient data/compute at a worse architecture wouldn't also produce comparable emergent behavior There needed to be someone willing to try going big at an organization with sufficient idle compute/data just sitting there, not a surprise it first happened at Google. |
|
Every bet makes perfect sense after you consider how promising the previous one looked, and how much cheaper the compute was getting. Imagine being tasked to train an LLM in 1995: All the architectural knowledge we have today and a state-level mandate would not have gotten all that far. Just the amount of fast memory that we put to bear wouldn't have been viable until relatively recently.