|
|
|
|
|
by teej
2102 days ago
|
|
A few comments based on lots and lots of experience: - Wide tables in columnar DBs can make some analytics queries easier to write and sometimes more performant. - Wide tables can come at high storage cost and make other queries less performant (like SELECT *) - How much of these things happen is extremely DB dependent. How does the DB's underlying storage mechanism work? How is the data partitioned and distributed? How sophisticated and storage-aware is the query planner? How large is your data? How fast is your data growing? How fast do you need a new data point to be reflected in your dashboard? There's no free lunch when it comes to this stuff. A perfect solution doesn't exist, but the technology is getting better all the time. I've personally never used sql server as a data warehouse but plenty of folks do. The stuff I use that I recommend very highly - Snowflake, TimescaleDB, vanilla Postgres. Also, use dbt. |
|
>How much of these things happen is extremely DB dependent. How does the DB's underlying storage mechanism work? How is the data partitioned and distributed? How sophisticated and storage-aware is the query planner? How large is your data? How fast is your data growing? How fast do you need a new data point to be reflected in your dashboard?
I think most of my frustration comes from not knowing these stuffs. I work as a BA-BI hybrid as I found my data analysis skills are good complements to data modelling/airflow type of work, so I persuaded my manager to give me some BI work. But that also means I don't have the technical knowledge such as DB internals (and TBH I can't even find a book for Vertica on that matter).
Our DB and DBAs and all ETL processes are located in HQ and we actually don't own our databases. This, I guess, adds an extra layer of discomfirt as we are effectively cut off from database-level optimization. Our data engineer is about to leave because he has no DE work to do (every ETL has to go through HQ's process and we only need to submit some configuration files).
We don't have access to the databases you recommended (again HQ controls that), but I do believe I could try DBT, may I ask how do you use it? From my understanding it is mostly a transformation tool, but what makes it stand out?