|
|
|
|
|
by nostrowski
818 days ago
|
|
Two things I'm curious to know: 1. How many tokens can 'traditional' models (e.g. Mistral's 8x7B) fit on a single 80GB GPU?
2. How does quantization affect the single transformer layer in the stack? What are the performance/accuracy trade-offs that happen when so little of the stack depends on this bottleneck? |
|
P.S. Dell Inspiron 7415 upgraded to 64 GB of RAM here.