Hacker News new | ask | show | jobs
by thenaturalist 1270 days ago
I'd second the notion that the question is far too open.

I'd add that dbt, databricks and snowflake are pretty strong bets still, but you have to acknowledge that they're becoming mainstream with an ever accelerating pace as the companies behind them churn out upskilling courses, meetups and acquire an ever larger share of the market.

If you like to be a specialist, going deep into either of those still holds career value.

If you're taking a more generalist view of where things are headed, the best prediction I heard someone say to set themselves apart is for Data Engineers to optimize for operationalizing data. Focusing much more on reverse ETL, becoming knowledgeable in building data web apps. The no-code or low-code movement around data apps will make the barrier of entry to set something up nonexistent, and I see how that will drive demand.

Pairing (big) data query/ frontend performance and web apps is another beast though.

For all my initial scepticism, I see the Data Mesh concept picking up pace in the years to come. It's vendor independent, couples well with Team Topologies and effective, decoupled, functional SWE teams. There still will be a big need for standards and conventions set by a small enabling core DE team, as of now, the knowledge gap between the baseline DE and your average SWE or Product Owner is just way too big in my experience.

Last but not least, I'd throw data lake out there. Apache Iceberg is getting a lot of attention and rightfully so. TCO of a query engine on top of files is so much better than any DWH and any org being able to optimize compute on data for it's current need will be able to save massively while the "convenience" gap steadily closes. Again, pretty generic but there's much to learn around Athena, Trino and the like.

I'm personally not a fan of learning a new language except maybe for Rust. There is an ever increasing stack of standard "low-code" tools for the typical ETL schlick, and Python won't go anywhere. Again, potential to differentiate will be low and ever lower in many contexts outside of proper big data. This is only me though and this view is highly context dependent, so YMMV of course.

1 comments

About snowflake, I am really curious. What do you mean by learn snowflake. The way I was told about snowflake is that it's a cloud based data warehouse. Are there advanced properties in snowflake which one has to learn? Or do you mean optimized queries?
Snowflake at it's most basic is SQL on cloud vms, anyone comfortable with SQL should feel at home there. That said, there are many Snowflake specific features that may take a bit to become familiar with. Just off the top of my head:

- hybrid RBAC, DAC, ABAC security model - column, row level, and tag based access policies - multi-account organizations - cross-account and region data replication - data shares - external tables and specialized formats (iceberg, delta) - pipes and streams - snowpark API - streamlit integration

The nice thing about Snowflake is that for many use cases it requires little management.

Things you can learn regarding Snowflake, other than the obvious (SQL, and Snowflake specific language extensions to SQL): proper table partitioning, Snowpipe (and the associated cloud messaging pipelines), and query performance tuning. (Complex queries can become a bear; identifying when its your query/partitioning or when its something on the Snowflake back end is challenging.)

There are always new additions to the Snowflake tooling ecosystem since the company is in competition with Databricks and others (e.g., Snowpark with Python).