|
|
|
|
|
by iamatoool
687 days ago
|
|
The SGD/pre training/deep learning/transformer local maxima is profitable. Trying new things is not, so you are relying on researchers making a breakthrough, but then to make a blip you need a few billion to move the promising model into production. The tide of money flow means we are probably locked into transformers for some time. There will be transformer ASICs built for example in droves. It will be hard to compete with the status quo. Transformer architecture == x86 of AI. |
|