|
|
|
|
|
by adamnemecek
1232 days ago
|
|
It turns out that transformers have a learning mechanism similar to autodiff but better since it happens mostly within the single layers as opposed to over the whole graph. I wrote a paper on this recently https://arxiv.org/abs/2302.01834v1. The math is crazy. |
|
"Bartender! A half-pint of your finest Combinatorial Hopf, if you please!"