|
|
|
|
|
by ozgune
4765 days ago
|
|
(Ozgun at Citus Data) You're right that you need to have Postgres installed. For running SQL over multiple JSON files, we wanted to keep the blog post short and noted several different ways to go over multiple files in our GitHub Readme. 1. You can create a partitioned PostgreSQL table, and declare one child table per JSON file. You can also declare constraints on the child table to filter out irrelevant files to the query. 2. You can create a distributed foreign table on CitusDB, even if you'd like to run on one node. In this case, we'll collect statistics automatically for you, and filter out irrelevant files. 3. If neither of these fit, you could change the source code to scan over one directory instead of a file. We didn't go down this path to be compatible with file_fdw's semantics. |
|
In the file system, to prevent too many records in a single directory it was split up per 1000 records... base/00001000/(1000-1999).json.gz ... this was mainly for being able to navigate this structure via a gui. I would suggest if your system can't do "basepath//*.json.gz" that you consider it.