Hacker News new | ask | show | jobs
by gbasin 507 days ago
the main bottleneck will be model depth... you can only do so much with N layers, and recurrence has proven to be way less efficient (for now)