Hacker News new | ask | show | jobs
by pepon 3461 days ago
I hope it is replaced with something better soon. You cannot see access statistics concerning the papers you upload, and they provide this absurd reason for not doing it: https://arxiv.org/help/faq/statfaq (it seems they think arxiv users are idiots or something, so they have to take care of us). Also getting the uploaded latex files to be compiled without errors is a pain, and they don't let you to just upload the pdf (this has pros or cons, but I wish there was the freedom to choose... and I guess that 99.999% of the time people just download the pdf).
4 comments

After reading your comment, I was inclined to agree with you about the statistics. After reading their FAQ, I was convinced to side with them.

Their point is that the stats are garbage-level useless. And I can imagine people bragging elsewhere that their paper received X,000 hits when in reality it's all spam or bots. It's not arxiv's responsibility to monitor that, but it wouldn't feel good to facilitate that kind of disinformation or invite hit inflation. Especially as scientists, we want to either publish good data or no data, not data that we know to be garbage.

As a scientist, give me the data and I will know what to do with it. AFAIK in the http://biorxiv.org/ they provide some statistics and it does not represent a problem.
As a fellow scientist, I'm much more concerned with how others will interpret these access data. I'm not excited about the prospect of yet another unreliable signal for e.g. hiring committees to latch onto, as they often do with journal impact factors and such.

It might be nice if ArXiv would perhaps provide the data to researchers on request. Just curious -- what kinds of questions would you use this data to answer?

I want the data for the same reasons that any content producer in the Internet wants it. Bloggers, youtubers, any company...everyone. Despite the noise this data might contain, it seems it's useful for everyone except for scientists...to whom I am surprised to hear that it's better not to give the data, in case they misinterpret it. Very risky statement and precedent.
I didn't mean to imply that the data wouldn't be useful, I was more asking to see if you had any specific questions in mind that this data could shed some light on. Relating download rates to citation is the first thing that comes to mind for me, though honestly I'd be much more interested in analyzing the full citation graph for my field, which generally doesn't post papers to the ArXiv.

It's not that I am personally concerned with misinterpreting the data. I just think there could be some downsides to releasing the data without limiting access in some way. For one, I think there are already issues with the citation metrics are used and interpreted, for example in tenure evaluations. I don't think it would be a step in the right direction if this data were used towards the same end...

Not providing raw download counts seems like a good thing; it's strongly privacy preserving.

On the other hand, perhaps a way for registered users to star papers that they like (similar to how Github lets you star projects) might be a good thing. It serves much the same purpose as a rough measure of popularity, but is entirely voluntary.

What's the privacy advantage of not providing anonymous download counts?
Requiring error-free latex is almost certainly a reasonable proxy for real curation effort.
The issue is that their LaTeX installation is fairly old, so there's a real chance of running into old bugs that have long since been fixed. It's a bit tiresome to work around those. I've had issues with their pgfplots version and had to resort to compiling the figures to pdf locally and including those.
Nah, I mean that it is a pain to upload error-free files. Due to dependencies with libraries and other reasons, a file that compiles in your computer often fails to compile in the arxiv.
There is one HUGE reason for not using PDFs -- PDFs are very blind-inaccessable, whereas tex is perfect.

For that reason alone, arXiv is really helping the blind community in academia.

EDIT: Add missing 'not' :)

> There is one HUGE reason for using PDFs -- PDFs are very blind-inaccessable

I think you may have mistyped this. ;-)