Hacker News new | ask | show | jobs
by whimsicalism 701 days ago
It's intrinsic to transformers that the inner workings are largely inscrutable. This is no different, but it does not mean they cannot be built upon.

Gradient descent works on these models just like the prior ones.