Y
Hacker News
new
|
ask
|
show
|
jobs
by
make3
1976 days ago
maybe so, but the largest one, the 1.5B parameters, will very likely take months to train on a single gpu. I've tried to
fine-tune
it, with a 256 slice of TPUv2, which is huge, and it took a few days