| The saddest part about this article being from 2014 is that the situation has arguably gotten worse. We now have even more layers of abstraction (Airflow, dbt, Snowflake) applied to datasets that often fit entirely in RAM. I've seen startups burning $5k/mo on distributed compute clusters to process <10GB of daily logs, purely because setting up a 'Modern Data Stack' is what gets you promoted, while writing a robust bash script is seen as 'unscalable' or 'hacky'. The incentives are misaligned with efficiency. |
I think a lot of people don't realize machines come with TBs of RAM and hundreds of physical cores. One machine is fucking huge these days.