Hacker News new | ask | show | jobs
by jackschultz 638 days ago
Very cool with the JSON_TABLE. The style of putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse / flatten is something I've been doing for a few years now. I've found it really great to put the json into a table, somewhere safe, and then do the parsing rather than dealing with possible errors on the scripting language side. I haven't seen this style been used in other places before, and to see it in the docs as a feature from new postgres makes me feel a bit more sane. Will be cool to try this out and see the differences from what I was doing before!
3 comments

A personal rule of mine is to always separate data receipt+storage from parsing. The retrieval is comparatively very expensive and has few possible failure modes. Parsing can always fail in new and exciting ways.

Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.

Did you mean "retrieval is comparatively inexpensive"? I think I'm on the same page but this threw me off.
I read it as retrieval being the requests to the scraped site. I can parse a few thousand HTML pages in minutes, but fetching them in the first place takes hours.
Exactly what I intended. Scraping is slow (and may be an irreplaceable snapshot in time). Parsing is fast and repeatable so should be done in a separate process from a stored copy.
I ended up with the same design after encountering numerous exotic failure modes.
> putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse

That’s a very good idea!

It is definitely an improvement on multiple `JSONB_TO_RECORDSET` and `JSONB_TO_RECORD` calls for flattening nested json.