| Yea, there's some logs here https://test.naisys.org/logs/ Inter-agent tasks is a fun one. Sometimes it works out, but a lot of the time they just end up going back and forth talking, expanding the scope endlessly, scheduling 'meetings' that will never happen, etc.. A lot of AI 'agent systems' right now add a ton of scaffolding to corral the AI towards success. The scaffolding is inversely proportional to the sophistication of the model. GPT-3 needs a ton, Opus needs a lot less. Real autonomous AI you should just be able to give a command prompt and a task and it can do the rest. Managing it's own notes, tasks, goals, reports, etc.. Just like if any of us were given a command shell and task to complete. Personally I think it's just a matter of the right training. I'm not sure if any of these AI benchmarks focus on autonomy, but if they did maybe the models would be better at autonomous tasks. |
sounds like "a straight shooter with upper management written all over it"