|
|
|
|
|
by rehaanahmad
656 days ago
|
|
Great idea, we'll look into making the home page the trending page soon. Regarding HTMl, our original site actually only supported HTML (because it was easier to build an annotator for an HTML page). the issue is that a good ~25% of these papers don't render properly which pisses off a lot of academics. Academics spend a lot of time making their papers look nice for PDF, so when someone comes along and refactors their entire paper in HTML, not everyone is a fan. That being said, I do think long term HTML makes a lot of sense for papers. It allows researchers to embed videos and other content (think, robotics papers!). At some point we do want to incorporate HTML papers back into the site (perhaps as a toggle). |
|
Did you bulk download the arxiv metadata, PDF and or LaTeX files?
I am trying to figure out what the required space is for just the most recent version of the PDF's.
I can find mentions of the total size in their S3 bucket but unclear if that also includes older versions of the PDF's.
I also wonder if the Kaggle dataset is kept up to date since it states merely 1.7M articles instead of 2.4 I read elsewhere.
Edit: I just found the answers to my question here: https://info.arxiv.org/help/bulk_data_s3.html