| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vl 87 days ago
	What is unclear why they need stuff of 27 and 6.7 million to operate essentially static hosting website in 2026.

3 comments

swiftcoder 87 days ago

The "essentially static hosting" isn't the cost centre (although with 5 million MAU, it's nothing to sneeze at). The real costs are on the input side - they have an ingestion pipeline that ensures standardised paper formatting and so on, plus at least some degree of human review.

link

bonoboTP 87 days ago

Do you mean that the CPU compute cost of turning latex into pdf/HTML is the main cost?

link

swiftcoder 87 days ago

No, I mean that the pipeline requires software engineers to build/maintain, and salaries are (as in basically every tech organisation) the dominant cost

link

bonoboTP 87 days ago

Then drop it and make people upload a pdf and a zip of the latex sources.

Most people I talk to hate that pipeline and spend a lot of debug hours on it when Arxiv can't compile what overleaf and your local latex install can.

link

domoritz 87 days ago

Arxiv can recompile latex to support accessibility and html. Going to pdf submissions would be a major step backward.

link

bonoboTP 87 days ago

Make it an external service then, and leave the thing that's already working great to just be.

The reason authors like and use arxiv is that it gives 1) a timestamp, 2) a standardized citable ID, and 3) stable hosting of the pdf. And readers like the no-nonsense single click download of the pdf and a barebones consistent website look.

All else is a side show.

link

lou1306 87 days ago

The PDF formatting is all but standardised. They ingest LaTeX sources, which is formatted according to the authors' whims (most likely, according to whatever journal or conference they just submitted the manuscript to). I'll concede that the (relatively novel) HTML formatter gives paper a more uniform appearance. They also integrate a bunch of external services for e.g., citation metrics and cross-references. Still hard to justify such a high cost to operate, but eh.

Also, the "human review" is a simple moderation process [1]. It usually does not dig into the submission's scientific merits.

[1] https://info.arxiv.org/help/moderation/index.html

link

planetoftofu 85 days ago

https://info.arxiv.org/about/reports/2024_arXiv_annual_repor...

A critical component of the arXiv-CE project is moving our services entirely off of Cornell University’s infrastructure — this goal is also known as Milestone 1. Milestone 1 completion is projected for the end of fiscal year 2026.

Assume if you are a library, and every day, half baked so-called books brought to the librarians where they have to make sure it is meaningful, readable and printable, 3000 of them, they accept and put them in the right bookshelf, and entire internet reads every one of them on the shelf multiple times by the AI bots, search engines and researchers.

They are not only making a new library, they are also maintaining both and syncing two libraries because Cornell cannot handle the volume of access by bots.

It is not static. It is essentially running two ships side-by-side, and two ships need to appear as one from the outside. And, the new ship is still only half built. The new ship is being designed, and being built. 27 seems small to me.

link

OtherShrezzing 87 days ago

I don't see it as an especially exuberant structure or budget. I've seen larger teams with bigger budgets struggle to maintain smaller applications.

I've contracted into some consultancy teams which you could uncharitably describe as "15 people and $4mn/yr to create one PDF per month".

link