Hacker News new | ask | show | jobs
by mulmen 1989 days ago
Yeah I think this is a common misconception with columnar stores. That if they like wide tables (they enable wide tables) that must mean the wider the better.

Sorting (or partitioning) is one of the most powerful optimizations in your toolbox. But only when optimized for some kind of access pattern. When you combine domains to get more width you have to make a compromise on the sorting. Then the wheels come off.

You still need different tables for clicks and orders and payments, even if they are very wide. You may or may not physically conform your dimensions in pure Kimball style but logically you (should) still start there.

1 comments

Yeah I really wish I could understand all these. There are too many words. Vertica, kafka, spark, and we use all of them. Figured I got to at least know their fundamentals to make good choices.
Yeah I don’t have experience with all the tools. I’m sure they are great and have their strengths. My current setup uses both EMR and Redshift but the data model is the same on both.