| The pitch is faster, and more space efficient since column stores are far better for analytics than row stores. Some benchmarks that found ~5-10x speedup: https://uwekorn.com/2019/10/19/taking-duckdb-for-a-spin.html Consider someone who analyzes medium-sized volumes of (slowly-changing) data -- OLAP, not OLTP. People who need to do this primarily have 2 alternatives: * columnar database (redshift, snowflake, bigquery) * a data lake architecture (spark, presto, hive) The latter can be slow and wasteful, because the data is stored in a form that allows very limited indexing. So imagine you want query speeds that require the former. Traditional databases can be hugely wasteful for this usecase -- space overhead due to no compression, slow inserts due to transactions. The best analytics databases are closed-source and come with vendor lockin (there are very few good open-source column stores -- clickhouse is one, duckdb is another). Most solutions are multi-node, so they come with operational complexity. So DuckDB could fill a niche here -- data that's big enough to be unwieldy, but not big enough to need something like redshift. It's analogous to the niche SQLite fills in the transactional database world. |
1. Is DuckDB similar to having indexes on each column? Because generally when something is slow, the solution is indexes. I have a 100 GB database which records real time data and is lightning fast because of some minor tuning.
2. The example of STDDEV not be available shows the author's unfamiliarity with SQLite which worries me.
https://docs.python.org/2.7/library/sqlite3.html#sqlite3.Con...
Could very easily have made a similar interface if necessary.