|
|
|
|
|
by dvt
832 days ago
|
|
I beg to differ. Transformers are purely an optimization. It’s not exactly right to call everything “compute scaling” but we are still, at the end of the day, fitting polynomials. And frankly, that’s probably not what our brains are doing. |
|