Hacker News new | ask | show | jobs
by ansk 918 days ago
When I open a large pdf on arxiv (100+ MB, not uncommon for ML papers focused on hi-res image generation), there is a significant load time (10+ seconds) before anything is rendered at all other than a loading bar. Does anyone know what the source of this delay is? Is it network-bound or is Chrome just really slow to render large PDFs? Do PDFs have to be fully downloaded to begin rendering? In any case, this delay is my only gripe with arxiv and a progressively rendered HTML doc that instantly loads the document text would be a huge improvement.
3 comments

> Does anyone know what the source of this delay is? Is it network-bound or is Chrome just really slow to render large PDFs? Do PDFs have to be fully downloaded to begin rendering? In any case, this delay is my only gripe with arxiv and a progressively rendered HTML doc that instantly loads the document text would be a huge improvement.

The default PDF format puts the xref table at the end of the file, forcing a full download before rendering can take place. PDF-1.2 onwards supports linearized PDFs, and most PDF export tools have some way of enabling it (usually an option like "optimize for web").

I have the same issue. From what I can tell it’s just network-bound and the Arxiv servers are slow. They theoretically allow for you to setup a caching server but after spending a while trying to get it setup, I haven’t been able to get it to work.

https://info.arxiv.org/help/faq/cache.html

maybe it'll be faster now with fastly

https://news.ycombinator.com/item?id=38723373

It may be even that the time is taken to generate a PDF.

The format in which articles are submitted and stored in arXive is LaTeX. PDF is automatically generated from it.

Probably arXiv does some caching of PDFs so they don't have to be generated anew every time they are requested, but I don't know how this caching works.