Hacker News new | ask | show | jobs
by rubenvanwyk 507 days ago
And yet there's still no straightforward way to write directly to Iceberg tables from Javascript as far as I know.
3 comments

Writing to catalogs is still pretty new. Databricks has recently been pushing delta-kernel-rs that DuckDb has a connector set up for, and there’s support for writing via Python with the Polars package through delta-rs. For small-time developers this has been pretty helpful for me and influential in picking delta lake over iceberg.
> influential in picking delta lake over iceberg

Can you expand on those reasons a bit?

The dependency on a catalog in Iceberg made it more complicated for simple cases than Delta, where a directory hierarchy was sufficient - if I was understanding the PyIceberg docs correctly.

for some reason it's really cumbersome to access this tech
I agree, as a long time Business Intelligence developer I‘m still confused and astounded with all the tooling and bits and pieces seemingly necessary to create analytics/dashboards with open source tools.

For years I used a proprietary solution like Qlik Sense for the whole journey from data extraction to a finished dashboard (mostly on-prem). Going from raw data to a finished dashboard is a matter of days (not weeks/month) with one single tool (and maybe some scripts for supporting tasks). There is some „scripting“ involved for loading and transforming data, but if you already understand data models (and maybe have some sql experience) it is very easy. The Dashboard creation itself does not need any coding at all.just drag and drop and some formulas like sum(amount).

But this a standalone tool and it is hard to integrate it into your own piece of software. From my experience, software developers have a much more complicated view on data handling. Often this is just the complexity of their use cases, sometimes it is just a lack of knowledge of data preparation for analytics use cases.

Another part which complicates stuff greatly is the focus on use-cases involving cloud storage and doing all the transformations on distributed systems.

And it is often not clear what amount of data we are talking about and if it is realtime (streaming) data or not. There is a big difference in the possible approaches, if you have 6h hours to prepare data or if it has to be refreshed every second (or when new data arrives etc).

Long story short: Yes it is complicated to grasp. There is also a big difference if you use the data for normal analytics use cases in a company (mostly read only data models) or if you use the data in a (big tech) product.

I would suggest to start simple by looking into a „query engine“ to extract some data from somewhere and then doing some transformations with pandas/polars/cubejs for basic understanding. You will need some schedulers and orchestration on the way forward. But this will be dependent on the real use cases and environment you are in.

I would argue that stuff like Iceberg is really aimed at Data Platform Engineers, not BI analysts. Companies I've worked with in the past have 10-15 people on a Platform team that work directly with stuff like this, to offer analysts and data scientists a view into the company's data.
What’s your use case? Iceberg is meant for analytical workloads