Y
Hacker News
new
|
ask
|
show
|
jobs
by
whimsicalism
701 days ago
It's intrinsic to transformers that the inner workings are largely inscrutable. This is no different, but it does not mean they cannot be built upon.
Gradient descent works on these models just like the prior ones.