| HN Mirror

My experience is similar: extract process -> raw data -> clean/merge -> model

Normally you extract from source, then load to destination. There is no business logic in this process.

From raw you do all of your transforms to get clean up and merge and then get it into a usable model. With big data sets I've done wtih Hadoop and then moved the clean/merged data to a standard or MPP DB for analysts. For normal sets this can all be done in a standard DB.

The other part is all the data is available from raw and clean/merge for analysts to use and is kept. With the thinking the storage cost are extremely low and heading to zero. Whereas in traditional DW analysts used only the modeled sets and depending on the data earlier sets are deleted as they are for operational purposes only. Storage is considered expensive and limiting.

The move to ELT and using a declarative dataops tool has been mind bending and has been a multiplier in terms of speed to get to something usable. I don't want to see another DW again.