|
|
|
|
|
by xs83
2349 days ago
|
|
I would like to see comparisons between CH files, I would specifically challenge the compressability of them vs ORC which pretty much maxes out current compression techniques. As soon as I see CH format being widespread enough to interact with the multitude of other tools that are available then I would consider getting on board - for now a "loadable" data warehouse does little for the kind of workflows we deal with as the loading would take longer than the processing. With regards to item two - we use a standard consumer GPU (1060 GTX) to handle the conversion from CSV to ORC / Parquet and it is much much faster and cheaper than a 20+ node spark cluster - hence the preference to work on files. As everything else runs off these files it is kind of integral to our workload |
|