Hacker News new | ask | show | jobs
by idiliv 807 days ago
Oh, and to answer your actual question: Assuming that the model is released with 16 bits per parameter, then it as 281GB / 16 bit = 140.5 parameters.