Also, you really want to wait until flash attention is merged before using mega context with llama.cpp. The 8 bit KV cache would be ideal too.
Also, you really want to wait until flash attention is merged before using mega context with llama.cpp. The 8 bit KV cache would be ideal too.