Hacker News new | ask | show | jobs
by ElCapitanMarkla 682 days ago
I'm unfamiliar with the parquet format and trying to understand - are you storing the raw scraped data in that format or are you storing the result of parsing the scraped data?
1 comments

We are storing the result of the parsed scrape as parquet. I would advice to store the raw data as well in a different s3. The database should only have the data it needs and not act as a storage.