Hacker News new | ask | show | jobs
by sathish316 147 days ago
+1 I’m not sure if tasks like Add OTel instrumentation belongs more in a Coding bench than an SRE bench. I came here expecting to see things like, this is how Models perform on finding the root cause in 50 complicated microservice failure scenarios.

For AI-SRE tasks like finding root cause of bugs and errors, I believe the key is to provide tools to the agent to query metrics, logs, traces and understand the problem. I’m working on a similar OSS framework and benchmark (work in progress using metrics and logs - demo - https://youtube.com/playlist?list=PLKWJ03cHcPr3Od1rwL7ErHW1p...), where context is Semantics and Text2SQL to query the right metrics, logs and benchmark is on a set of Skills that Claude code or other agents can run using these tools to find the root cause of errors:

Codd Semantic/Text2SQL engine: https://github.com/sathish316/codd_query_engine

PreCogs skills and simulated scenarios: https://github.com/sathish316/precogs_sre_oncall_skills

1 comments

I'm surprised by how many people think that SRE's job is to debug.

SRE's job is to make the software reliable, for instance by adding telemetry, understanding and improving the failure modes, the behavior under load etc.

So a better SRE test would not be "read the logs and fix the bug", but rather "read the code and identify potential issues".