Hacker News new | ask | show | jobs
by jofer 4155 days ago
Each image is a different band. Landsat imagery is multispectral imagery. For Landsat 8 data there are 11 bands and for Landsat 7 data there are 8 (if I recall correctly?). Each band is typically distributed as an uncompressed, 16-bit tif.

JPEG and similar compression techniques are very effective at compression image data without losing _visual_ information. However, it's very important not to use lossy compression (such as JPEG) for scientific data such as this. Our eyes may not mind, but classification algorithms do.

For various reasons, Landsat data is typically distributed and stored as a zip of uncompressed tiffs. The actual bit depth is often a bit lower than 16 bit (depends on the band), but it's easiest to distribute it as 16 bit to avoid the need for unusual file formats.

At any rate, you can certainly compress these images quite a bit. However, lossless compression algorithms are always at a disadvantage compared to lossy compression (e.g. JPEG). There are plenty of more advanced lossless compression algorithms (e.g. JPEG2000 or MrSid for this exact use case) than putting uncompressed images in a zip file. Nonetheless, they only shave ~10-20% (ballpark: it varies a lot) of the file size off compared to zip (they have other huge advantages when it comes to accessing the data, though).

However, the data needs to be easily used by a wide range of applications and users. It's easy to unzip a file and then have "raw" uncompressed images. For distribution, the drop-off in usability just isn't worth the savings in file size. Also, there are a lot of legacy applications out there that wouldn't react well to the USGS suddenly changing the way it distributes Landsat data.