Hacker News new | ask | show | jobs
by zengid 62 days ago
any tips for running it locally within an agent harness? maybe using pi or opencode?
1 comments

It pretty much just works. Run the unsloth quant in llama.cpp and hook it up to pi. A bunch of minor annoyances like not having support for thinking effort. It also defaults to "interleaved thinking" (thinking blocks get stripped from context), set `"chat_template_kwargs": {"preserve_thinking": True},` if you interrupt the model often and don't want it to forget what it was thinking.