Hacker News new | ask | show | jobs
by TuringTest 1201 days ago
Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.

This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).