| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jackschultz 638 days ago
	Very cool with the JSON_TABLE. The style of putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse / flatten is something I've been doing for a few years now. I've found it really great to put the json into a table, somewhere safe, and then do the parsing rather than dealing with possible errors on the scripting language side. I haven't seen this style been used in other places before, and to see it in the docs as a feature from new postgres makes me feel a bit more sane. Will be cool to try this out and see the differences from what I was doing before!

3 comments

0cf8612b2e1e 638 days ago

A personal rule of mine is to always separate data receipt+storage from parsing. The retrieval is comparatively very expensive and has few possible failure modes. Parsing can always fail in new and exciting ways.

Disk space to store the returned data is cheap and can be periodically flushed only when you are certain the content has been properly extracted.

link

cjonas 637 days ago

Did you mean "retrieval is comparatively inexpensive"? I think I'm on the same page but this threw me off.

link

franga2000 637 days ago

I read it as retrieval being the requests to the scraped site. I can parse a few thousand HTML pages in minutes, but fetching them in the first place takes hours.

link

0cf8612b2e1e 637 days ago

Exactly what I intended. Scraping is slow (and may be an irreplaceable snapshot in time). Parsing is fast and repeatable so should be done in a separate process from a stored copy.

link

erichocean 638 days ago

I ended up with the same design after encountering numerous exotic failure modes.

link

abyesilyurt 638 days ago

> putting json response (from API, created from scraping, ect.) into jsonb column and then writing a view on top to parse

That’s a very good idea!

link

ellisv 638 days ago

It is definitely an improvement on multiple `JSONB_TO_RECORDSET` and `JSONB_TO_RECORD` calls for flattening nested json.

link