|
|
|
|
|
by anonym29
140 days ago
|
|
This is a breeze to do with llama.cpp, which has had Anthropic responses API support for over a month now. On your inference machine: you@yourbox:~/Downloads/llama.cpp/bin$ ./llama-server -m <path/to/your/model.gguf> --alias <your-alias> --jinja --ctx-size 32768 --host 0.0.0.0 --port 8080 -fa on
Obviously, feel free to change your port, context size, flash attention, other params, etc.Then, on the system you're running Claude Code on: export ANTHROPIC_BASE_URL=http://<ip-of-your-inference-system>:<port>
export ANTHROPIC_AUTH_TOKEN="whatever"
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
claude --model <your-alias> [optionally: --system "your system prompt here"]
Note that the auth token can be whatever value you want, but it does need to be set, otherwise a fresh CC install will still prompt you to login / auth with Anthropic or Vertex/Azure/whatever. |
|