Hacker News new | ask | show | jobs
by billconan 7 days ago
I do not understand.

how is this different from building smaller transformer layers, and each layer just denoises less?