Hacker News new | ask | show | jobs
by shutupalready 4155 days ago
Thank you, but I still don't get this part:

> The Landsat images seem to be much larger than they need to be for the level of detail they provide.

For the level of detail I'm seeing, the same information could be represented in a 2MB download (rather than 600-700MB that's currently downloaded).

2 comments

Each image is a different band. Landsat imagery is multispectral imagery. For Landsat 8 data there are 11 bands and for Landsat 7 data there are 8 (if I recall correctly?). Each band is typically distributed as an uncompressed, 16-bit tif.

JPEG and similar compression techniques are very effective at compression image data without losing _visual_ information. However, it's very important not to use lossy compression (such as JPEG) for scientific data such as this. Our eyes may not mind, but classification algorithms do.

For various reasons, Landsat data is typically distributed and stored as a zip of uncompressed tiffs. The actual bit depth is often a bit lower than 16 bit (depends on the band), but it's easiest to distribute it as 16 bit to avoid the need for unusual file formats.

At any rate, you can certainly compress these images quite a bit. However, lossless compression algorithms are always at a disadvantage compared to lossy compression (e.g. JPEG). There are plenty of more advanced lossless compression algorithms (e.g. JPEG2000 or MrSid for this exact use case) than putting uncompressed images in a zip file. Nonetheless, they only shave ~10-20% (ballpark: it varies a lot) of the file size off compared to zip (they have other huge advantages when it comes to accessing the data, though).

However, the data needs to be easily used by a wide range of applications and users. It's easy to unzip a file and then have "raw" uncompressed images. For distribution, the drop-off in usability just isn't worth the savings in file size. Also, there are a lot of legacy applications out there that wouldn't react well to the USGS suddenly changing the way it distributes Landsat data.

Show us / try it yourself! Take one of the bands (~125 Megabytes) and losslessly compress it as small as you can. Then share the results. Science is fun!

Also be aware that TIFF is super fast, which is important for some applications. And the scene compilation is already compressed with bzip2.