Hacker News new | ask | show | jobs
by fdsjgfklsfd 304 days ago
Do you mean "all variants of the same stacked transformer architecture converge in performance"? Or do you know of tests against some other architecture? The diffusion-based LLMs?