Hacker News new | ask | show | jobs
by matt1 905 days ago
Great site, thanks for sharing. Can you explain how you're determining how many times a paper is cited? Obviously papers include a list of references, but extracting them accurately from the PDF is difficult in my experience (two column formats, ugh) - though the new HTML versions help. And even if you have a list, many authors just mention arXiv paper titles, not their ids, making identifying specific references tricky.
1 comments

Difficult, yes… but not impossible :)

I just extract the titles and look for their respective ids.

The real challenge was how to do that at scale. Only in CS there are well over half a million papers