Hacker News new | ask | show | jobs
by jakderrida 916 days ago
This is exactly what I was using HN for. But, yeah, in kinda sucked compared to yours. Another thing I was trying to create was some sort of NN model that could use the semanticscholar h-index of authors along with the abstract text and T5 to estimate the one-year out citations. Just for personal use, though. That whole thing fell apart because semanticscholar is kinda crap for associating author links to the same author. I frequently ended up with the wrong professors, which I'd think would be easily fixable for them.
2 comments

Just a note to say that factoring authors into the ranking system is high on my todo list. v1 won't be too fancy - just a hardcoded list of prominent authors whose papers warrant extra visibility. A future version will likely automate it to avoid the hardcoded list.

Also, soon-ish I'm going to add the ability for users to follow specific authors, so you can get notified when they publish new papers.

> Also, soon-ish I'm going to add the ability for users to follow specific authors, so you can get notified when they publish new papers.

If you could do it, this would be a dream. My original intent was to be able to look through only papers citing a popular one and filtering the results for ones having at least one author with a set minimum h-index. Using Google Scholar data required using SerpAPI, which has some annoying limitations.

The core goal is obviously just not to miss out on a paper that will very likely be influential while not having to comb through the mountain of irrelevant papers.

What's funny is that Microsoft Academic was the best suited, but was retired in 2021.

I did that (used other features). This is how new papers are ranked here:

https://trendingpapers.com

Great site, thanks for sharing. Can you explain how you're determining how many times a paper is cited? Obviously papers include a list of references, but extracting them accurately from the PDF is difficult in my experience (two column formats, ugh) - though the new HTML versions help. And even if you have a list, many authors just mention arXiv paper titles, not their ids, making identifying specific references tricky.
Difficult, yes… but not impossible :)

I just extract the titles and look for their respective ids.

The real challenge was how to do that at scale. Only in CS there are well over half a million papers