|
|
|
|
|
by just_testing
2310 days ago
|
|
I am replying here with some constructive criticism, because I found it awesome. As far as I can tell, TileDB is like a better SQLite and better Parquet, is that it? The webpage does not tell me exactly what TileDB is, so I'm having a hard time getting understanding what's underneath all the marketing mumbo-jumbo. But if it is indeed a better SQLite/Parquet, DAMN, I'm so going to spread the gospel of this to all my students. |
|
For a lot of what I do, I want a hierarchical containment system- the equivalent of folders with files. And the files themselves are leaves in the hierarchy, containing multidimensional array data. WOrks great when the arrays are composed of fairly straightforward payloads, like float[x][y][z] but also works if your array values are structs. Much of the value in zarr and tiledb comes from specifically how they arrange the arrays, for convenient read access to slices of the arrays. Access is going to look like: ages = root["user"]["age"][100:100:2]
Parquet is mostly a column file format, but with nesting. I'd use it to store large amounts of structured data with a relatively straightforward schema, although the schema itself can be fairly nested so some records have very complex structure. Access would often be in a loop over all records: for record in records: if record.user.has_age(): print("User age:", record.user.age)
SQLIte is a library/CLI that implements a relational database. It has a SQL interface and stores data using classic relational DB approaches, including secondary indices, etc, and permitting joins directly within the engine: SELECT age FROM user WHERE user.country == 'Bulgaria'