Hacker News new | ask | show | jobs
by mdaniel 1178 days ago
To add to this excellent reply, I'll also point out that the reason folks want the weights is that they are the result of a massive search operation, akin to finding the right temperature to bake a cake from all possible floats. It takes a lot of wall clock time, and a lot of GPU energy, and a lot of input examples and counter-examples to find the "right" numbers. Thus, it really is better -- all things being equal -- to publish the results of that search to keep everyone else from having to repeat the search for themselves
1 comments

> a massive search operation, akin to finding the right temperature to bake a cake from all possible floats

...for each of 13 billion (for a model with that many parameters) different cakes, except that they aren’t like cakes because the “best" temperature for each depends on the actual temperatures chosen for the others.

It's 2^(16*13,000,000,000) different cakes.
Way better than paperclips.