|
|
|
|
|
by wolfgang42
412 days ago
|
|
20 GB of JSON is correct; here’s the entire dump straight from the API up to last Monday: $ du -c ~/feepsearch-prod/datasource/hacker-news/data/dump/*.jsonl | tail -n1
19428360 total
Not sure how your sqlite file is structured but my intuition is that the sizes being roughly the same sounds plausible: JSON has a lot of overhead from redundant structure and ASCII-formatted values; but sqlite has indexes, btrees, ptrmaps, overflow pages, freelists, and so on. |
|