Hacker News new | ask | show | jobs
by jesseryoung 1982 days ago
4 years ago I moved from a role where I primarily wrote C# as an architect on a web application, to an architect helping to build a data warehouse. The contrast in tooling, discipline and information available to build anything in the data world is so stark it had me questioning my career decisions. Sure, you can read Kimball and Inmon and I'm sure there are a handful of others out there - but there are drastically fewer than what you can find in the application development space.

Things are getting better, Visual ETL tools are falling out of favor to proper coded ETL (spark, dbt, etc) and data teams are starting to see the value of actually engineering a solution instead of just throwing it over the wall to a DBA to deal with. But tooling, and general information on the web is still lacking. Pushing data engineers over "etl developers" or "bi developers" (or "data scientists") will drastically improve any organizations ability to actually deliver real analytics and hopefully an industry wide push will raise all ships.

1 comments

Why do you think that coded ETL is winning over the click-and-drag variant? I'd say the latter makes things a lot easier no?
My bias has always been against click-and-drag programming, and I believe it mostly comes from my application developer background as the sentiment towards visual style application development tools is (almost) unanimously negative.

Coming over to the data world, I noticed the same type of problems click-and-drag app development had appearing in tools like IBM's DataStage and Informatica's Powercenter. There's only so much you can do by dragging and dropping items on a screen, eventually you need to take their respective escape hatches and do some programming - and when you do it's almost never ideal. I've also yet to see a visual coding tool produce readable concise diffs in any source control provider. Most of these tools also require some sort of centralized server infrastructure and a thick client making it so much more challenging to bootstrap new ETL developers.

I do hear others in the data world who have migrated to Spark or DBT share the same sentiments - but that could just be confirmation bias.

Advances in ETL. Spark and DBT are large improvements over pre 2010 ETL tools. Give it a few years and we'll see really good GUIs for Spark/DBT.