Hacker News new | ask | show | jobs
by jakebol 3163 days ago
Jake from TileDB, Inc. wenc and srean are right. techniques such as those used in zfp and fpzip wenc mentioned are also used to compress real world las file (point cloud) datasets. For the moment we are only focused on lossless compression (scientists are paranoid about losing data), but there is definitely room to explore integration with lossy compression as well. Machine learning applications often do not need full precision so intelligent forms of lossy compression are useful.

Another cool research application of TileDB that extends the storage library with the VP9 codec can be seen here: https://homes.cs.washington.edu/~magda/papers/haynes-sigmod1...

1 comments

You can get very good lossless compression with floating point numbers, Facebook's Gorilla paper comes to mind. I usually use it for delta-of-delta encoding which provides very high compression for time series. While that won't really help in your case, their floating point encoding could help compressing matrices quite efficiently.

http://www.vldb.org/pvldb/vol8/p1816-teller.pdf [page 5]