| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sadiq 455 days ago
	This is good though it's not clear whether these papers will appear in the PMC Open Access subset (https://pmc.ncbi.nlm.nih.gov/tools/openftlist/) and be bulk downloadable. I've been doing some work with colleagues at Cambridge and Imperial over the last year on using LLMs to improve evidence synthesis, primarily trying to find papers on the effectiveness of certain Conservation interventions. It's becoming clear that you really need to move beyond screening papers only by title and abstract - there's often information buried deep within papers that can only be found with access to full text. My colleague Anil Madhavapeddy has written a bit about our adventures in trying to ingest full-text academic papers: https://anil.recoil.org/notes/uk-national-data-lib

2 comments

shishy 455 days ago

Yes, it depends on what you're doing; for general paper discovery / search tasks, title abstract can be enough (which is also why Springer and Elsevier have been pulling even their abstracts from sources like OpenAlex).

But for something like that you need full texts to look into results sections. I'm very curious how you're dealing with information contained in tables, or if you're dealing with snippets of text from the full-text alone. Have you poked around Elicit yet?

link

a_bonobo 454 days ago

I've recently had this problem where the important information (number of study participants, and how many were filtered out during which step) were only encoded in figures, not in the text. Maddening.

link

spookie 455 days ago

Do you know of any ready to use alternatives to title and abstract screening? Wondering about it since I'm in the weeds of doing so.

link

tough 455 days ago

what do you mean exactly? I was suprised how with grobid many of at least the arXiv papers are easily converted to xml for better processing than PDF.

Most of the papers are constructed from their latex sources so there's an easy way to undo it i guess.

https://github.com/kermitt2/grobid

link

shishy 455 days ago

grobid is a wonderful resource, patrice did an awesome job (I used it at my previous job at scite.ai)

link

spookie 454 days ago

that's exactly what I needed!

link

tough 454 days ago

glad to hear!

link