| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tgdn 1707 days ago

Have you looked into Spark? There are managed Spark options on AWS/GCP (for example Databricks). Spark lets you do exactly what you are saying.

Define minimum/maximum number of nodes, the machine capacity (RAM/CPU) and let Spark handle the scaling for you.

It gives you a Jupyter-like runtime to work on possibly massive datasets. Spark is perhaps too much for what you're looking for. Kubernetes could possibly be used with Airflow/DBT possibly, for example for ETL/ELT pipelines.

1 comments

ekns 1707 days ago

Ideally I'd like to extend at least the illusion of an ad hoc PC/workstation to the cloud. For me it seems like it would be less effort until I reach some ridiculous scale that requires more engineering and setup anyway.

link