| Everything is LLMs these days. LLMs this, LLMs that. Am I really missing out something from these muted models? Back when it was released, they were so much capable but now everything is muted to the point they are mostly autocomplete on steroids. How can adding analytics to a system that is designed to act like humans produce any good? What is the goal here? Could you clarify why would some need to analyze LLMs out of all the things? > Rich text data makes LLM traces unique, so we let you track “semantic metrics” (like what your AI agent is actually saying) and connect those metrics to where they happen in the trace But why does it matter? Because at the current state these are muted LLMs overseen by the big company. We have very little to control the behavior and whatever we give it, it will mostly be 'politically' correct. > One thing missing from all LLM observability platforms right now is an adequate search over traces. Again, why do we need to evaluate LLMs? Unless you are working in a security, I see no purpose because these models aren't as capable as they used to be. Everything is muted. For context: I don't even need to prompt engineer these days because it just gives similar result by using the default prompt. My prompts these are literally three words because it gets more of the job done that way than giving elaborate prompt with precise example and context. |