Hacker News new | ask | show | jobs
by neilk 4762 days ago
Okay, so if I understand correctly, the benefit here is that I can run some kinds of SQL queries on a large collection of heterogenous JSON documents. Is that right? You're still speaking Postgres-internals-geek, not I-want-to-get-something-done geek.

That is:

- you can take heterogenous JSON documents, and then make them appear to the Postgres world as separate tables. (Which they have to be, because they're not uniform)

- you can then take some kinds of SQL queries and distribute them among these tables, as if they were one big SQL table. If some pseudo-tables have missing or extra columns, no problem.

1 comments

Thanks for the comment. Yes, that's exactly the benefit.

You could in fact have heterogeneous JSON documents within one file, and json_fdw could still handle them. If a field you declared in the table schema doesn't appear in a JSON document, json_fdw would just consider that field to be Null.

The issue is, json_fdw is built such that one table can only associated with one JSON file, and users typically have many JSON files lying around. For example, you'd have one JSON file for each hour's worth of website error logs.

If that's the case, you could create a distributed table. That way, you could run the query one distributed table and have the query run on all your JSON files instead of just one.

That said, it's probably best to associate one type of JSON logs (say your website error logs) with one distributed table. That way, your queries go over your website error logs or your mobile application logs, but not both at the same time.