Hacker News new | ask | show | jobs
by wolfgang42 412 days ago
20 GB of JSON is correct; here’s the entire dump straight from the API up to last Monday:

  $ du -c ~/feepsearch-prod/datasource/hacker-news/data/dump/*.jsonl | tail -n1
  19428360        total
Not sure how your sqlite file is structured but my intuition is that the sizes being roughly the same sounds plausible: JSON has a lot of overhead from redundant structure and ASCII-formatted values; but sqlite has indexes, btrees, ptrmaps, overflow pages, freelists, and so on.
1 comments

Sqlite also doesn’t have fixed types, but uses a tagged value system to store data. Well according to what I’ve read on the topic.