Hacker News new | ask | show | jobs
by recursivecaveat 848 days ago
This is kindof an odd statement because the transformer is not the most generic neural net. It's the result of many levels of improvements in architecture over older designs. The bitter lesson is methods that can scale well with compute win (alpha/beta beats heuristics alone, neural network beats alpha/beta), not that the most obvious and generic approach eventually wins. Given the context-length problems with transformers I think it's fair to say they have scaling problems.