Hacker News new | ask | show | jobs
by potatoman22 306 days ago
This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents? It’s hard to imagine it’s safe to entrust model validation and bias assessments to an automated system, especially in healthcare. Validating clinical AI is pretty complex between finding the right data, ensure event timings are accurate, simulating the model, etc. That’s why I’m guessing Parachute is a little less automated than the landing page makes it out to be, which is maybe a good thing. Regardless, this is cool. Hope you make AI in healthcare more safe.
5 comments

That’s a great point. We don’t use AI agents to grade other models. Instead, we run in-house evaluations tailored to each category of clinical AI, giving hospitals an apples-to-apples comparison between similar vendors.
This line of thinking always leaves me confused about other peoples experience in the Pre-AI world. People and systems around me fail all the time because evaluation fails. Yes, the failure modes are different but I don't consider them favorable without AI. In fact, I consider that they are better.

For example, consider what happens in this video: https://www.youtube.com/watch?v=AZhCYisIQB8&t=2s

Please don't make this mistake of thinking "aha, but you see, a human intervened!" This will never happen in the real world for the vast majority of humans in a similar scenario.

I'm afraid I don't quite understand your point. What line of thinking are you referencing? Also risk scores and algorithms have been used in medicine for over 50 years, so evaluating them isn't anything new.
> This is cool, but I’m a little skeptical. If Parachute uses AI agents to evaluate other models, who’s evaluating the AI agents?

Usually you can run human-in-the-loop spot checks to ensure that there's parity between your LLM evaluators and the equivalent specialist human evaluator.

I wouldn't touch a YC company for this use case. All the marketing from the landing pages is just that - blabla.
My friend, you are overthinking! The funding round just came in and its smooth sailing ahead. Well for the next six months.
"mummmble murmble ummbble... but that can|will be easily fixed|addressed|solved by future models"