Hacker News new | ask | show | jobs
by ranman 742 days ago
Someone mentioned that this took almost as much compute to train as the original model.
1 comments

source please!