|
|
|
|
|
by emikulic
1115 days ago
|
|
Last time I looked into this, the answer was "because huggingface transformers and torch.load are written to do it this way" You could absolutely do something streaming, or mmap the weights instead of loading them into system RAM. Just the default interfaces don't. |
|