|
|
|
|
|
by chaxor
1547 days ago
|
|
One thing I would love to see from the arxiv sites is a publicly available download of an SQLite database. They have a bunch of PDFs, and latex source - but the real killer would be a database with just the text for each section, and then the ability to generate* the pdf, using various different styles. This would save an enormous amount of space, and make things far more tidy.
I suppose the images could be stored in the SQLite as blobs, but there's probably a better way with vector dbs or something. That's what the future will probably look like. With the SQLite decentralized on IPFS or torrent, where only queries get stored on each computer, making more popular queries faster to load (more peers). *(or maybe an archive of a tons of zstd parquets for each table? - Not sure what the best way to organize several tables in parquet is yet) |
|
Why? The output pdf is typically smaller than the input that produces it. Using rendered pdfs seems simple and very natural, and at worst can use twice the total amount of space.