| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rkwz 539 days ago
	This is nice! How do you extract the data from pdf or images? How do you reduce inaccuracies in this process?

1 comments

essaylor 539 days ago

It's a combination of using an LLM and some pre and post processing. Data extraction itself has been fairly accurate in my experience. The bigger challenge has been biomarker name normalization because different labs often name the same biomarkers quite differently.

link

rkwz 538 days ago

Thanks, sounds interesting!

link