Hacker News new | ask | show | jobs
by ipsum2 834 days ago
Fun fact, I can also train a 24 trillion parameter model on my laptop! Just need to offload weights to the cloud every layer.

...

It's meaningless to say something can train a model that has 24 trillion parameters without specifying the dataset size and time it takes to train.

1 comments

I dare say this thing will be many times faster than thrashing your 24T parameters to the cloud.
Yeah, but it'll be slower than the equivalent Nvidia GPU cluster.