What are the mechanisms to actually ensure that something is done?
For instance skyvern and agent E are awesome open source frameworks but often times will say a task is completed, but it actually has not done the action.
Great question. We've implemented a series of extensible hallucination guardrails that are hyper focused on task completion, to deal with various error states, and ultimately get the user to what they wanted complete in the final stages. While AgentE and Skyvern are great platforms that ease the setup of task completion agents, they still lack in being able to get a simple task done. I've tried multiple times to book a restaurant on skyvern and AgentE, but the agents never got to completion. We've approached this very differently with Toivo.