| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by troyastorino 614 days ago

(Co-founder of PicnicHealth here; we trained LLMD)

Accuracy and deploying in appropriate use cases is key for real world use. Building guardrails, validation, continuous auditing, etc is a larger amount of work than model training.

We don't deploy in EHRs or sell to physicians or health systems. That is a very challenging environment, and I agree that it would be very difficult to appropriately deploy LLMs that way today. I know Epic is working on it, and they say it's live in some places, but I don't know if that's true.

Our main production use case for LLMD at PicnicHealth is to improve and replace human clinical abstraction internally. We've done extensive testing (only alluded to in the paper) comparing and calibrating LLMD performance vs trained human annotator performance, and for many structuring tasks LLMD outperforms human annotators. For our production abstraction tasks where LLMD does not outperform humans (or where regulations require human review), we use LLMD to improve the workflow of our human annotators. It is much easier to make sure that clinical abstractors, who are our employees doing well-defined tasks, understand the limitations in LLM performance than it would be to ensure that users in a hospital setting would.

1 comments

ozborn 614 days ago

Haven't read the whole paper yet, but what are the possibilities for academic and evaluation use of this model?

link

troyastorino 611 days ago

The answer is a little nuanced.

We train on real records, and even though they are de-identified in training we still have to keep the model closed and under careful management to protect against the possibility of information leaking.

We are, though, definitely invested in this corner of research, and want to be able to work with others to push medical AI forward.

Given that, the best model for us is to collaborate on an engagement-by-engagement basis. For now we'd look to find ways to do the work directly involving LLMD within our systems.

If you research in the field and have some ideas, I'd love to chat!

link