|
|
|
|
|
by jamesponddotco
23 days ago
|
|
This seems pretty awesome; being able to use an 8B model for tool calling would be perfect. Interested in using this for Home Assistant using a Mac Mini as my server. Does it run on MacOS? How is the latency when using the proxy? I’m using Claude Haiku 4.5 for my voice assistant right now and it’s pretty fast, but if I could keep the LLM local, it’d be even better. |
|
Latency is dependent on the guardrails firing, effectively. If nothing fires, it's a passthrough, for all intents and purposes, very little overhead. But if a retry nudge fires then that's another LLM call.
As a consumer for a home assistant, a retry nudge firing is something I'd catch, and have my voice model output a pre-baked "one sec, trying again" sort of filler message or something.