Hacker News new | ask | show | jobs
by esafak 429 days ago
It is a service, not an open source tool, as far as I can tell. Do you intend to stay that way? What is the business model and pricing?

I am a bit concerned that you want users to swap out both their storage and workflow orchestrator. It's hard enough to convince users to drop one.

How does it compare to DuckDB or Polars for medium data?

2 comments

- Yes. it is a service and at least the runner will stay like that for the time being.

- We are not quite live yet, but the pricing model is based on compute capacity and it is divided in tiers (e.g. small=50GB for concurrent scans=$1500/month, large can get up to a TB). infinite queries, infinte jobs, infinite users. The idea is to have a very clear pricing with no sudden increases due to volume.

- You do not have to swap your storage - our runner comes to your S3 bucket and your data never ever have to be anywhere else that is not your S3.

- You do not have to swap your orchestrator either. Most of our clients are actually using it with their existing orchestrator. You call the platform's APIs, including run from your Airflow/Prefect/Temporal tasks https://www.prefect.io/blog/prefect-on-the-lakehouse-write-a...

Does it help?

Yep, staying service.

RE: workflow orchestrators. You can use the Bauplan SDK to query, launch jobs and get results from within your existing platform, we don’t want to replace entirely if it’s doesn’t fit for you, just to augment.

RE: DuckDB and Polars. It literally uses DuckDB under the hood but with two huge upgrades: one, we plug into your data catalog for really efficient scanning even on massive data lake houses, before it hits the DuckDB step. Two, we do efficient data caching. Query results and intermediate scans and stuff can be reused across runs.

More details here: https://www.bauplanlabs.com/blog/blending-duckdb-and-iceberg...

As for Polars, you can use Polars itself within your Python models easily by specifying it in a pip decorator. We install all requested packages within Python modules.