One thing I'm confused about is why does Iceberg need a spark deployment to function? Or am I wrong about that? I would rather avoid that ecosystem if I can.
You don't need a Spark deployment. The first reference implementations for reading and writing were in Spark.
Now, with PyIceberg, there is read support in Python. Write support should be merged very soon - https://github.com/apache/iceberg-python/pull/41
So, very soon, you will be able to read/write Iceberg tables in Python.
I look forward to doing data transformations in Polars for data of reasonable scale (up to 100GB or so) and writing to Iceberg tables with PyIceberg. No Spark.
Well, what about other languages? Every language needs bindings or a re-implementation? (i.e., iceberg tables are written/queried in-process as opposed to via a network API?)
Now, with PyIceberg, there is read support in Python. Write support should be merged very soon - https://github.com/apache/iceberg-python/pull/41 So, very soon, you will be able to read/write Iceberg tables in Python. I look forward to doing data transformations in Polars for data of reasonable scale (up to 100GB or so) and writing to Iceberg tables with PyIceberg. No Spark.