Hacker News new | ask | show | jobs
by rkwz 539 days ago
This is nice!

How do you extract the data from pdf or images? How do you reduce inaccuracies in this process?

1 comments

It's a combination of using an LLM and some pre and post processing. Data extraction itself has been fairly accurate in my experience. The bigger challenge has been biomarker name normalization because different labs often name the same biomarkers quite differently.
Thanks, sounds interesting!