No one (publically) had really pushed any of these techniques far, especially not for such a big run.
the transformer was an entirely new architecture, very different step change than this
e: and alibaba
the transformer was an entirely new architecture, very different step change than this
e: and alibaba