|
|
|
|
|
by markus_zhang
2101 days ago
|
|
I'm wondering what you guys think about columnar databases and wide tables. We use Vertica and from senior colleagues and even Vertica developers I got the impression that big wide tables are good because it eliminates the needs of join. Thus we don't use star schema and just wide tables. However I think data modelling is also about embedding proper business logic and it would be a lot more confusing if two unrelated columns are stuffed in the same table. For example let's say my event based game has a base for each player and he can do a lot of things in his base. If we use a wide table then we wilk see columns related to factory building side by side with pig feeding, and because each small feature has some unique columns, a lot of those columns are NULL simply because this event json doesn't even have th fields. I'm wondering if we should use Vertica for a transactional type table and then use say sql server for dwh and build more traditional data modelling. But this could be awfully wrong maybe... |
|
- Wide tables in columnar DBs can make some analytics queries easier to write and sometimes more performant.
- Wide tables can come at high storage cost and make other queries less performant (like SELECT *)
- How much of these things happen is extremely DB dependent. How does the DB's underlying storage mechanism work? How is the data partitioned and distributed? How sophisticated and storage-aware is the query planner? How large is your data? How fast is your data growing? How fast do you need a new data point to be reflected in your dashboard?
There's no free lunch when it comes to this stuff. A perfect solution doesn't exist, but the technology is getting better all the time. I've personally never used sql server as a data warehouse but plenty of folks do.
The stuff I use that I recommend very highly - Snowflake, TimescaleDB, vanilla Postgres. Also, use dbt.