Y
Hacker News
new
|
ask
|
show
|
jobs
by
anjneymidha
323 days ago
this is a really neat project: "an automated, daily evaluation suite to track model performance over time, monitor for regression during peak load periods, and detect quality changes across flagship LLM APIs."