|
|
|
|
|
by quantadev
595 days ago
|
|
One reason to believe there's even new low hanging fruit (that doesn't even require new math) is how simple and trivial the "Attention Heads" structure of the Transformer architecture really is. It's not advanced at all. It was just a great ideal that panned out that pretty much any creative AI researcher could've thought up after smokin' a joint. lol. I mean someone could do trivial experiments with different Perceptron network structuring and end up revolutionizing the world. I think things are gonna get interesting real quick once LLMs themselves start "self experimenting" with writing code for different architectures. |
|