Hacker News new | ask | show | jobs
by h_tbob 612 days ago
I wish they didn’t use swiGLU and preRMSnorm so we could have a better comparison.

Then we would know how much this transformer innovation helps by itself.