|
|
|
|
|
by Glemllksdf
56 days ago
|
|
I have the same setup but tried paperclip ai with it and it seems to me that either i'm unable to setup it properly or multiply agents struggle with this setup. Especially as it seems that paperclip ai and opencode (used for connection) is blowing up the context to 20-30k Any tips around your setup running this? I use lmstudio with default settings and prioritization instead of split. |
|
My command for llama-server:
llama-server -m /models/gemma-4-26B-A4B-it-UD-Q8_K_XL.gguf -ngl 99 -sm layer -ts 10,12 --jinja --flash-attn on --cont-batching -np 1 -c 262144 -b 4096 -ub 512 -ctk q8_0 -ctv q8_0 --host 0.0.0.0 --port 8080 --timeout 18000