| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by markus_zhang 2147 days ago

Thanks teej for the answer.

>How much of these things happen is extremely DB dependent. How does the DB's underlying storage mechanism work? How is the data partitioned and distributed? How sophisticated and storage-aware is the query planner? How large is your data? How fast is your data growing? How fast do you need a new data point to be reflected in your dashboard?

I think most of my frustration comes from not knowing these stuffs. I work as a BA-BI hybrid as I found my data analysis skills are good complements to data modelling/airflow type of work, so I persuaded my manager to give me some BI work. But that also means I don't have the technical knowledge such as DB internals (and TBH I can't even find a book for Vertica on that matter).

Our DB and DBAs and all ETL processes are located in HQ and we actually don't own our databases. This, I guess, adds an extra layer of discomfirt as we are effectively cut off from database-level optimization. Our data engineer is about to leave because he has no DE work to do (every ETL has to go through HQ's process and we only need to submit some configuration files).

We don't have access to the databases you recommended (again HQ controls that), but I do believe I could try DBT, may I ask how do you use it? From my understanding it is mostly a transformation tool, but what makes it stand out?

1 comments

teej 2147 days ago

No amount of technology can fix a dysfunctional org.

A large amount of the work of BI is getting data into the right shape in order to ask questions about it.

- What’s level on average do new players drop off?

- How are sales of the new skins we introduced performing vs the last set?

- How often are players engaging with feature A vs B?

You might be able to answer all of these questions with just event data in a wide table, but you’ll write very different queries to answer them. dbt is a way to write queries on your source data that leads to systematic, repeatable, and reusable new tables for answering analytics questions.

link