Hacker News new | ask | show | jobs
by make3 1976 days ago
maybe so, but the largest one, the 1.5B parameters, will very likely take months to train on a single gpu. I've tried to fine-tune it, with a 256 slice of TPUv2, which is huge, and it took a few days