|
|
|
|
|
by sanchitmonga22
95 days ago
|
|
Fair criticism. The action executed on the LLM side but didn't translate to the correct macOS action, the model hallucinated success instead of routing to the open_url tool. This is a known limitation with small LLMs (0.6B-1.2B) doing tool calling. They sometimes confuse "I know what you want" with "I did it." Upgrading to a larger model improves tool-calling accuracy significantly. We're also working on verification, having the pipeline confirm the action actually succeeded before reporting back. Thats a fair expectation and we should meet it. |
|
To me this is this nut to crack, wrt tool calling and locally running inference. This seems like a really cool project and I'm going to dive around a little later but if it's hallucinating for something as basic as this makes me think it's more of POC stage right now (to echo other sentiment here).