Hacker News new | ask | show | jobs
by teej 2102 days ago
A few comments based on lots and lots of experience:

- Wide tables in columnar DBs can make some analytics queries easier to write and sometimes more performant.

- Wide tables can come at high storage cost and make other queries less performant (like SELECT *)

- How much of these things happen is extremely DB dependent. How does the DB's underlying storage mechanism work? How is the data partitioned and distributed? How sophisticated and storage-aware is the query planner? How large is your data? How fast is your data growing? How fast do you need a new data point to be reflected in your dashboard?

There's no free lunch when it comes to this stuff. A perfect solution doesn't exist, but the technology is getting better all the time. I've personally never used sql server as a data warehouse but plenty of folks do.

The stuff I use that I recommend very highly - Snowflake, TimescaleDB, vanilla Postgres. Also, use dbt.

1 comments

Thanks teej for the answer.

>How much of these things happen is extremely DB dependent. How does the DB's underlying storage mechanism work? How is the data partitioned and distributed? How sophisticated and storage-aware is the query planner? How large is your data? How fast is your data growing? How fast do you need a new data point to be reflected in your dashboard?

I think most of my frustration comes from not knowing these stuffs. I work as a BA-BI hybrid as I found my data analysis skills are good complements to data modelling/airflow type of work, so I persuaded my manager to give me some BI work. But that also means I don't have the technical knowledge such as DB internals (and TBH I can't even find a book for Vertica on that matter).

Our DB and DBAs and all ETL processes are located in HQ and we actually don't own our databases. This, I guess, adds an extra layer of discomfirt as we are effectively cut off from database-level optimization. Our data engineer is about to leave because he has no DE work to do (every ETL has to go through HQ's process and we only need to submit some configuration files).

We don't have access to the databases you recommended (again HQ controls that), but I do believe I could try DBT, may I ask how do you use it? From my understanding it is mostly a transformation tool, but what makes it stand out?

No amount of technology can fix a dysfunctional org.

A large amount of the work of BI is getting data into the right shape in order to ask questions about it.

- What’s level on average do new players drop off?

- How are sales of the new skins we introduced performing vs the last set?

- How often are players engaging with feature A vs B?

You might be able to answer all of these questions with just event data in a wide table, but you’ll write very different queries to answer them. dbt is a way to write queries on your source data that leads to systematic, repeatable, and reusable new tables for answering analytics questions.