| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Rietty 181 days ago

Working in a Data Engineering/Operations role which focuses heavily on financial datasets. Everything is within AWS and Snowflake and each table can easily have >100M records of any type of random data (there is a lot of breadth.) General day to day is creating jobs that will process large amounts of input data and storing them into Snowflake, sending out tons of automated reports and emails to decision makers as well as gathering more data from the web.

All of this is done in a Python environment with usage of Rust for speeding up critical code/computations. (The rust code is delivered as Python modules.)

The work is interesting and different challenges arise when having to process and compute datasets that are updated with 10s of TBs of fresh data daily.

2 comments

doom2 181 days ago

Hello fellow data engineer! I feel like I don't see a lot of us around / don't see many popular submissions dealing with data engineering. I also work with financial datasets (think aggregated consumer transaction data) for use by investors and corporate clients

link

Rietty 180 days ago

Many of my datasets are similiar!

link

jftuga 181 days ago

> General day to day is creating jobs that will process large amounts of input data and storing them into Snowflake

About how long do these typically take to execute? Minute, Tens of Minutes, Hours?

My work if very iterative where the feedback loop is only a few minutes long.

link

Rietty 180 days ago

Depends on the dataset anywhere from seconds to tens of minutes depending on preprocessing needed.

link

Rietty 180 days ago

Some of the largest are a few billion rows and we sample randomly when developing code then execute it on all

link