| HN Mirror

In a nutshell, data and AI workloads require fast re-building and vertical scaling:

1) you should not need to redeploy a Lambda if you you're running January and February vs only January now. In the same vein, you should not need to redeploy a lambda if you upgrade from pandas to polars: rebuilding functions is 15x faster than lambda, 7x snowpark (-> https://arxiv.org/pdf/2410.17465)

2) the only way (even in popular orchestrators, e.g. Airflow, not just FaaS) to pass data around in DAGs is through object storage, which is slow and costly: we use Arrow as intermediate data format and over the wire, with a bunch of optimizations in caching and zero-copy sharing to make the development loop extra-fast, and the usage of compute efficient!

Our current customers run near real-time analytics pipelines (Kafka -> S3 / Iceberg -> Bauplan run -> Bauplan query), DS / AI workloads and WAP for data ingestion.