|
Fully agree - even as a founder of an ‘LLM observability company’. Observability does not need to be reinvented to get detailed traces/metrics/logs of the LLM part of an application. LLM Observability usually means: prompts and completions, which model was used, errors and exceptions (rate limits, network errors), as well as metrics (latency, output speed, time to first token when streaming, USD/token and cost breakdowns). All of this is well suited to be captured in the existing observability stack. OpenLLMetry makes this really easy and interoperable - chapeau. In my view, observability is not the core value that solutions like Baserun, Athina, LangSmith, Parea, Arize, Langfuse (my project) and many others solve for.
Developing a useful LLM application requires iterative workflows and tinkering. That's what these solutions help with and augment. There are specific problems to building an LLM application such as managing/versioning of prompts, running evaluations, blending multiple different evaluation sources, collecting datasets to test/benchmark an application, helping with fine-tuning models on high-quality production completions, debugging root causes of quality/latency/cost issues, ... Most solutions either replicate logs (LLM I/O) or traces at first, as they are a necessary starting point to then build solutions for the other workflow problems. As the observability piece gets more standardized over time, I can see how integrating with the standard makes a ton of sense. Always happy to chat about this. |
Would love to see you integrate and adopt this as soon as it makes sense to you. OpenTelemetry is a great and mature piece of technology and we should all be aligning around it now, while it’s still easy to do so.