Hacker News new | ask | show | jobs
Training 3x larger model on the same GPU cards (github.com)
2 points by xxr3376 1848 days ago
1 comments

MegEngine (A Deep Learning Framework) implements DTR. Now you can train 3x larger model by tradeoff a little bit speed for lots of GPU memory.