|
|
|
|
|
by gabdiax
101 days ago
|
|
I'm also curious about this. In some cases I've seen teams rely on a mix of automated metrics and human review, especially for production systems where reliability matters a lot. But evaluation pipelines for AI still seem much less standardized compared to traditional software monitoring. |
|