Hacker News new | ask | show | jobs
by jaylevitt 5266 days ago
I've read a few books on data warehousing, and maybe you can confirm my suspicion:

Isn't ETL just an acronym that means "I wrote this Perl script to populate the database"?

How on earth is that even an industry?

2 comments

Simple ETL jobs are mostly just E & L: extract the data from one system, load it into another.

Where things get complex is in the Transform aspect of some jobs. Mapping disparate schemas is complex, often messy work. Especially when one (or both) sides of the ETL job have poor/no primary keys, foreign keys, or even are just "mostly standard" CSV files [shudder].

Also: some ETL jobs can get quite large. I know one guy who had to create an ETL system that continuously moved data from one 1200-table system into some other system. Crazy.

The term "ETL" itself is often used in place of "Data Integration" which is much larger, particularly when it comes to data warehouse design. The wiki article is a good drop off point: http://en.wikipedia.org/wiki/Data_integration

It may be difficult to understand how this is an industry coming from a web development/startup angle (big supposition there) but there are literally thousands of companies with lots of databases varying in age, size and complexity that need integrating, and plenty of companies competing for that work as either implementors or software providers. A perl script might do the job but most products focus on performance, reuse, ease of maintenance and compatability across many different database/file types.