|
|
|
|
|
by ivansavz
2243 days ago
|
|
If you're going to be doing ML and require downloads of PDFs, I would recommend getting the bulk data from s3 instead of downloading: https://arxiv.org/help/bulk_data_s3
It's a little more complicated to use, but you get it ALL ;) In addition to TfIdf, topic modelling would is a very good fit for browsing and finding similar papers. Here is a demo of LDA applied to 10% of the quant-ph arXiv papers that I worked on back in the day: https://www.cs.mcgill.ca/~isavov/arxiv_demo/readme.html |
|