| HN Mirror

I appreciate this.

Most software devs are used to working 1-dimensional collections like lists, or tree-like abstractions like dicts, or some combination of those. This is why most abstractions and complex data types are built on these. Objects are a natural progression.

But in the data world, high-dimensions are modeled using dataframes (analogously, tables). This is a paradigm shift for most pure software people because manipulating tables typically require manipulating sets and relations. Joining two tables requires knowing how the join-columns relate (inner, full outer, left, right, cross). Aggregation and window functions require thinking in terms of sub-groupings and subsets.

It's not super intuitive unless you have to manipulate and reshape large amounts of data every day, which data scientists and data engineers have to do, and which software engineers typically don't do very often. It's just a kind of muscle memory that gets developed.

I definitely had trouble at first getting software engineers to buy into DuckDB because they were unfamiliar and uncomfortable with SQL. Fortunately some of them had a growth mindset and were willing to learn, and those folks now have now acquired a very powerful tool (DuckDB) and a new way of thinking about large-data manipulation. When data is a certain size, iterative constructs like "for" loops become impractical.