The better question is, why is DuckDB so popular when one can use Polars which has a sane, lintable, typesafe API compared to the mess that is SQL:
WITH lagged AS (
SELECT
*,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events
),
sessions AS (
SELECT
*,
SUM(COALESCE((date_diff('minute', prev_time, event_time) > 30)::INT, 1))
OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM lagged
)
SELECT
user_id,
session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
COUNT(*) AS event_count
FROM sessions
GROUP BY ALL
ORDER BY user_id, session_start;
Polars typesafe? It doesn't show you any errors until runtime right?
Kusto query language is the best I've seen at type safety and I wish open source DBs would steal some ideas from it.
That does look nicer if you have a Parquet file and want to analyze it. But DuckDB is also a database - if you want a persistent, reliable and mutable data store I don't think Polars would be suitable would it? (Genuine question - you sound like an expert and I'm not.)
Performance is definitely one of them, but it also has inconsistent and duplicated methods, inconsistent defaults (e.g. some methods are inplace by default), copy by reference issues, I could go on.
It was an early winner in an extremely popular language. That's really the main thing going for it, but alternatives have been a long time coming.
Why would you prefer Python and Pandas over good old SQL? Pandas is so verbose and hard to debug, most of the times struggle to be performant on small datasets.
SQL has been around since the dawn of databases. I am happy to see a trend away from pandas.