|
|
|
|
|
by mebassett
189 days ago
|
|
large language models are large and must be loaded into memory to train or to use for inference if we want to keep them fast. older models like gpt3 have around 175 billion parameters. at float32s that comes out to something like 700GB of memory. newer models are even larger. and openai wants to run them as consumer web services. |
|
For one, if this was about inference, wouldn't the bottleneck be the GPU computation part?