| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tikkun 1043 days ago

Then there's also monitoring. My notes from the monitoring section are below:

When needed: “it goes hand-in-hand with eval, as you need to be able to turn bad prod generations into failing eval cases for eng to make pass”

Considerations:

1) “Ability to monitor custom metrics (ROUGE [1], Coherence, etc.) and slice-and-dice data - customizability; non-intrusive logging vs proxies, VPC (enterprise-readiness). Plenty of tools w/ basic cost, latency monitoring; very few w/ enterprise-grade customizability, anomaly detection, etc.”

2) “+agent/pipeline tracing, ability to re-purpose data for fine tuning, connection to user feedback, man-in-the-middle approach (proxy) vs SDK integration (we believe SDK is superior so your monitoring vendor can go down without taking down your LLM feature)”

Companies for LLM monitoring: Helicone, Honeyhive, Gentrace, Humanloop, Langsmith, Pezzo

[1]: https://en.wikipedia.org/wiki/ROUGE_(metric)

1 comments

batshit_beaver 1043 days ago

I'd suggest looking into WhyLabs. They've got anomaly detection, lightweight SDK, complete data privacy, and ability to ingest custom metrics: https://docs.whylabs.ai/docs/start-here

link

tikkun 1043 days ago

Do you have an association with them or just a happy user? Either is fine of course

link

batshit_beaver 1043 days ago

Former employee, yes. Stumbled across this thread, thought I'd chime in. Didn't realize how many other folks are working on tools for this problem!

link