| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Boxxed 906 days ago
	One thing I'm confused about is why does Iceberg need a spark deployment to function? Or am I wrong about that? I would rather avoid that ecosystem if I can.

2 comments

jamesblonde 906 days ago

You don't need a Spark deployment. The first reference implementations for reading and writing were in Spark.

Now, with PyIceberg, there is read support in Python. Write support should be merged very soon - https://github.com/apache/iceberg-python/pull/41 So, very soon, you will be able to read/write Iceberg tables in Python. I look forward to doing data transformations in Polars for data of reasonable scale (up to 100GB or so) and writing to Iceberg tables with PyIceberg. No Spark.

link

Boxxed 892 days ago

Well, what about other languages? Every language needs bindings or a re-implementation? (i.e., iceberg tables are written/queried in-process as opposed to via a network API?)

link

benjaminwootton 906 days ago

It tends to be more library dependencies than live clusters.

A lot of data lakes are managed using Hadoop and Spark so I think it’s just an artefact of that.

In the end I can’t see why you wouldn’t just be able to create and manage Iceberg files directly from a standard Python/JS/Java without that legacy.

link