| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xs83 2349 days ago

I would like to see comparisons between CH files, I would specifically challenge the compressability of them vs ORC which pretty much maxes out current compression techniques.

As soon as I see CH format being widespread enough to interact with the multitude of other tools that are available then I would consider getting on board - for now a "loadable" data warehouse does little for the kind of workflows we deal with as the loading would take longer than the processing.

With regards to item two - we use a standard consumer GPU (1060 GTX) to handle the conversion from CSV to ORC / Parquet and it is much much faster and cheaper than a 20+ node spark cluster - hence the preference to work on files.

As everything else runs off these files it is kind of integral to our workload