it goes into detail about llama-server args; quants to try; and layer/kv cache splits. I plan to try the techniques there.
it goes into detail about llama-server args; quants to try; and layer/kv cache splits. I plan to try the techniques there.