Y
Hacker News
new
|
ask
|
show
|
jobs
by
fdsjgfklsfd
304 days ago
Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs?