|
|
|
|
|
by brucethemoose2
929 days ago
|
|
It would be very tight. 8x7B 24GB (currently) has more overhead than 70B. Its theoretically doable, with quantization from the recent 2 bit quant paper and a custom implementation (in exllamav2?) EDIT: Actually the download is much smaller than 8x7B. Not sure how, but its sized more like a 30B, perfect for a 3090. Very interesting. |
|