Hacker News new | ask | show | jobs
by neom 523 days ago
I think the problem is people are going to start playing, everyone is going to train in their own things, businesses are going to want to train different architectures for different business functions etc. I did my first real adventure with training last night, $3,200 and a lot of fun later (whooops) - the tooling has become very easy to use, and I presume will just get easier. If I want to train in even say 10ish gigs, wouldn't I want to use a DC, even with a powerful laptop or DiLoCo? Seems unlikely DiLoCo is enough?

(edit: I may also not be accounting enough for using a pre-trained general model next to a fine tuned specialized model?)