|
|
|
|
|
by fxtentacle
943 days ago
|
|
You'd be surprised how far that takes you. I mean I was truly astonished when I saw that a GptNeoX LLM quantized down to 1.5 bits per value at 90% sparsity was still producing acceptable predictions. But the size went from multiple GBs to less than 1 MB of (compressed) parameters. |
|