|
Then there's also monitoring. My notes from the monitoring section are below: When needed: “it goes hand-in-hand with eval, as you need to be able to turn bad prod generations into failing eval cases for eng to make pass” Considerations: 1) “Ability to monitor custom metrics (ROUGE [1], Coherence, etc.) and slice-and-dice data - customizability; non-intrusive logging vs proxies, VPC (enterprise-readiness). Plenty of tools w/ basic cost, latency monitoring; very few w/ enterprise-grade customizability, anomaly detection, etc.” 2) “+agent/pipeline tracing, ability to re-purpose data for fine tuning, connection to user feedback, man-in-the-middle approach (proxy) vs SDK integration (we believe SDK is superior so your monitoring vendor can go down without taking down your LLM feature)” Companies for LLM monitoring: Helicone, Honeyhive, Gentrace, Humanloop, Langsmith, Pezzo [1]: https://en.wikipedia.org/wiki/ROUGE_(metric) |