| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Shelnutt2 2309 days ago

TileDB is more than a format. At its core, it is an engine that allows you to store and access multi-dimensional arrays (dense and sparse) very fast. Similar to Parquet, its sparse array support can capture dataframes[1]. It is more general than Parquet in the sense that it can support fast multi-dimensional slicing, by defining a subset of its columns to act as dimensions. This effectively buys you a primary multi-dimensional index (and in the future we could add secondary indexes as well). Also, TileDB handles updates, time traveling and partitioning at the library level, obviating the need for using extra services like Delta Lake to deal with the numerous Parquet files you may create.

The good news is that we offer efficient integrations with MariaDB, PrestoDB and Spark, so you can directly process SQL queries on TileDB data via those engines (which work even for dense data). With MariaDB, we even have a embedded version which allows running SQL queries directly from Python[2]. This combines the ease of use of sqlite with MariaDB's speed and TileDB's fast access to AWS S3 (and Azure Blob Store in the next version).

[1] https://docs.tiledb.com/main/use-cases/dataframes

[2] https://docs.tiledb.com/developer/api-usage/embedded-sql

Disclosure: I am a member of the TileDB team.