|
|
|
|
|
by skhatter
102 days ago
|
|
Interesting — are you instrumenting the agent workflows themselves with OpenTelemetry spans? I was wondering how well the standard o11y stack works once agents start running multi-step workflows (agent → tools → other agents → APIs). Tracing probably helps visualize the steps, but I'm curious how people handle operational things like retries, replaying failed workflows, or containing cascading failures across agents. Those reliability aspects are what I've been exploring. |
|
People too focused on AI and what's popular are missing the forest for a tree. Most of the things I see being talked about as problems & solutions have already been taken care of with mature tooling choices and engineering practices. History and archeology are underrated skills in software.