|
|
|
|
|
by int_19h
1088 days ago
|
|
llama-30B (which is actually 33B) and derivatives generally run fine with 4-bit quantization on a single RTX 3090 or 4090, although depending on group size used for quantization you may need to slightly dial down the context size. |
|