Hacker News new | ask | show | jobs
by lostmsu 159 days ago
So no comparison?
1 comments

comparisons will be run when the quality of generation will be on pair with other available models. It is useless to have preformance if the quality is not at lease on par.

The paper runs a bench (code and bench in the paper) to compare the performance with a causal attention GPT-2 model (nanoGPT) at inference (20% faster) and at training (equivalent for T and D larger than a threshold).