Hacker News new | ask | show | jobs
by josteink 1489 days ago
I thought the common denominator for this was (and has been, for decades) ETL? Extract, Transform, Load and in that order, because what other order would make sense?

Getting such basic things wrong doesn’t exactly give the reader the impression that the writer knows the subject particularly well.

5 comments

Its not a typo but describes a product difference. ELT is a common term for a new type of data eng workflow that isn't mapped to ETL. In most ELT products the raw data is loaded into and then transformed by the data warehouse tool. As opposed to the ETL pipelines many of us are used to where the data gets transformed in a separate process before being dropped into the DW.
data warehouses are now processing engines that scale compute independently of storage. you can simply dump your raw data into warehouse and do your transform there. Hence ELT.
Using a data warehouse as a processing engine is one of the dumbest things to come out of the data world.

ELT is genius marketing though!

why is it dumb?
Allows providers to double and triple bill a user when under ETL they would only charged once.
not sure i follow this. Why wouldn't you pay more if you are doing more inside the warehouse. Also setting up systems to run places for pre wh transform isn't free.
They aren’t the only ones using ELT instead of ETL. I hear they are different but I have no motivation to even Wikipedia the topic to find out.
ETL: dumb data warehouse. You do your own transform before inserting the data.

ELT: smart data warehouse. You dump the raw data and transform in the data warehouse later.

Or, DWH is just storage vs DWH is storage + transform.

When I built a ETL pipeline before the term ELT existed, we had multiple pipelines that transformed data in the data warehouse as well as transforms of the raw data on the way in. The transforms on the "way" in were minor like renames or formats while the transforms from within the dwh were much more involved.

Every pipe from db to dw is going to be some form of ETL, it's just that with "raw" data the transform is minimal. Unless your going apples-to-apples data storage, there's going to be some transforms in the actual underlying data types.

I agree with a comment down below, it's a genius marketing term.

> we had multiple pipelines that transformed data in the data warehouse

I've done this in the past too. But that isn't simply what ELT is. In the past compute was tied to storage which made these transforms interfere with load on the actual system.

ETL is the standard terminology that I learned, going back decades.
See dbt (data build tool) for an example of ELT workflow.