| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by in-silico 46 days ago

I wonder how different their method actually is from other sub-quadratic sparse attention methods like Reformer [1] and Routing Transformer [2].