| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pepon 3461 days ago
	I hope it is replaced with something better soon. You cannot see access statistics concerning the papers you upload, and they provide this absurd reason for not doing it: https://arxiv.org/help/faq/statfaq (it seems they think arxiv users are idiots or something, so they have to take care of us). Also getting the uploaded latex files to be compiled without errors is a pain, and they don't let you to just upload the pdf (this has pros or cons, but I wish there was the freedom to choose... and I guess that 99.999% of the time people just download the pdf).

4 comments

beering 3461 days ago

After reading your comment, I was inclined to agree with you about the statistics. After reading their FAQ, I was convinced to side with them.

Their point is that the stats are garbage-level useless. And I can imagine people bragging elsewhere that their paper received X,000 hits when in reality it's all spam or bots. It's not arxiv's responsibility to monitor that, but it wouldn't feel good to facilitate that kind of disinformation or invite hit inflation. Especially as scientists, we want to either publish good data or no data, not data that we know to be garbage.

link

pepon 3461 days ago

As a scientist, give me the data and I will know what to do with it. AFAIK in the http://biorxiv.org/ they provide some statistics and it does not represent a problem.

link

rsfern 3461 days ago

As a fellow scientist, I'm much more concerned with how others will interpret these access data. I'm not excited about the prospect of yet another unreliable signal for e.g. hiring committees to latch onto, as they often do with journal impact factors and such.

It might be nice if ArXiv would perhaps provide the data to researchers on request. Just curious -- what kinds of questions would you use this data to answer?

link

pepon 3461 days ago

I want the data for the same reasons that any content producer in the Internet wants it. Bloggers, youtubers, any company...everyone. Despite the noise this data might contain, it seems it's useful for everyone except for scientists...to whom I am surprised to hear that it's better not to give the data, in case they misinterpret it. Very risky statement and precedent.

link

rsfern 3461 days ago

I didn't mean to imply that the data wouldn't be useful, I was more asking to see if you had any specific questions in mind that this data could shed some light on. Relating download rates to citation is the first thing that comes to mind for me, though honestly I'd be much more interested in analyzing the full citation graph for my field, which generally doesn't post papers to the ArXiv.

It's not that I am personally concerned with misinterpreting the data. I just think there could be some downsides to releasing the data without limiting access in some way. For one, I think there are already issues with the citation metrics are used and interpreted, for example in tenure evaluations. I don't think it would be a step in the right direction if this data were used towards the same end...

link

skybrian 3461 days ago

Not providing raw download counts seems like a good thing; it's strongly privacy preserving.

On the other hand, perhaps a way for registered users to star papers that they like (similar to how Github lets you star projects) might be a good thing. It serves much the same purpose as a rough measure of popularity, but is entirely voluntary.

link

robotresearcher 3461 days ago

What's the privacy advantage of not providing anonymous download counts?

link

javajosh 3461 days ago

Requiring error-free latex is almost certainly a reasonable proxy for real curation effort.

link

lorenzhs 3461 days ago

The issue is that their LaTeX installation is fairly old, so there's a real chance of running into old bugs that have long since been fixed. It's a bit tiresome to work around those. I've had issues with their pgfplots version and had to resort to compiling the figures to pdf locally and including those.

link

pepon 3461 days ago

Nah, I mean that it is a pain to upload error-free files. Due to dependencies with libraries and other reasons, a file that compiles in your computer often fails to compile in the arxiv.

link

CJefferson 3461 days ago

There is one HUGE reason for not using PDFs -- PDFs are very blind-inaccessable, whereas tex is perfect.

For that reason alone, arXiv is really helping the blind community in academia.

EDIT: Add missing 'not' :)

link

btym 3461 days ago

> There is one HUGE reason for using PDFs -- PDFs are very blind-inaccessable

I think you may have mistyped this. ;-)

link