| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by l3x 636 days ago

From the FAQs on GitHub [1]

> What about PMTiles?

> I would have loved to use PMTiles; they are a brilliant idea!

> Unfortunately, making range requests in 80 GB files just doesn't work in production. It is fine for files smaller than 500 MB, but it has terrible latency and caching issues for full planet datasets.

> If PMTiles implements splitting to <10 MB files, it can be a valid alternative to running servers.

[1] https://github.com/hyperknot/openfreemap

4 comments

bdon 636 days ago

See my response here: https://news.ycombinator.com/item?id=41638031

link

apitman 636 days ago

That's an interesting claim. I make range requests to 100GB+ files (genomics) all the time for work and it works great. I've never considered total file size as directly related to latency in this respect, assuming you have some sort of an index of course.

link

bdon 636 days ago

You can test this claim directly against a AWS S3 bucket.

First 100KB of a 100GB+ file:

curl -H "Range: bytes=0-100000" https://overturemaps-tiles-us-west-2-beta.s3.amazonaws.com/2... --output tmp -w "%{time_total}"

First 100KB at the 100GB mark:

curl -H "Range: bytes=100000000000-100000100000" https://overturemaps-tiles-us-west-2-beta.s3.amazonaws.com/2... --output tmp -w "%{time_total}"

link

hyperknot 635 days ago

Here the requests are really really small, on average 405 bytes each. I guess in your genomics work you are making larger requests, so probably it's not so much of an issue.

BTW, we are discussing latency with bdon in this issue, it seems to be specific to Cloudflare: https://github.com/hyperknot/openfreemap/issues/16

link

apitman 634 days ago

I just tried @bmon's curl examples above with 100 byte requests. Similar results. I think the Cloudflare explanation is more likely.

link

tobilg 635 days ago

If you store the PMTiles in S3 or any other object store that supports HTTP Range Requests, that's a no-brainer... In a normal disk on you own server, this might become interesting, yes.

link

mistrial9 636 days ago

ok except "full planet datasets" make little sense for terrestrial features. Splitting .. aka sharding the files into basic continents would make SO much sense. Asia is big, but no requests for Africa mixed in.. Australia would be manageable?

link

hyperknot 636 days ago

PMTiles could come up with a version in the future where instead of one 90 GB file, they have 9 thousand 10 MB files. That would work well I believe.

link

bdon 636 days ago

The latency for small files and ranges of large files is pretty similar on most storage platforms, but there are some exceptions like Cloudflare R2.

The main reason PMTiles is one file and not two or more files is that it enables atomic updates in-place (which every mature storage platform supports) as well as ETag content change detection in downstream caches. All of the server and serverless implementations at http://github.com/protomaps support this now for AWS, S3-compatible storage, Google Cloud, and Azure.

link

wcedmisten 636 days ago

Now I'm curious, what causes the latency for range requests with R2?

link

bdon 636 days ago

I don't have any insight into this other than observing how their storage system works, but here's some scripts I made last year to test:

https://github.com/bdon/cloudflare-r2-latency

link

mannyv 635 days ago

Range requests means work and logic. Getting a file requires no logic.

Also, I'm pretty sure range requests are going to be difficult to cache. That implies going to origin every request which is bad.

link