|
|
|
|
|
by tzury
405 days ago
|
|
I’ve found that starting with a plain old filesystem often outperforms fancy services - just as the Unix philosophy (“everything is a file” [1]) has preached for decades [2]. When BigQuery was still in alpha I had to ingest ~15 billion HTTP requests a day (headers, bodies, and metadata). None of the official tooling was ready, so I wrote a tiny bash script that: 1. uploaded the raw logs to Cloud Storage, and
2. tracked state with three folders: `pending/`, `processing/`, `done/`.
A cron job cycled through those directories and quietly pushed petabytes every week without dropping a byte. Later, Google’s own pipelines—and third-party stacks like Logstash—never matched that script’s throughput or reliability.Lesson: reach for the filesystem first; add services only once you’ve proven you actually need them. [1] https://en.wikipedia.org/wiki/Everything_is_a_file
[2] https://en.wikipedia.org/wiki/Unix_philosophy |
|
I would add that filesystems are superior to data formats (XML, JSON, YAML, TOML) for many use cases such as configuration or just storing data.
- Hierarchy are dirs,
- Keys are file names,
- Value is the content of the file.
- Other metadata are in hidden files
It will work forever, you can leverage ZFS, Git, rsync, syncthing much better. If you want, a fancy shells like Nushell will bring the experience pretty close to a database.
Most important you don't need fancy editor plugins or to learn XPath, jq or yq.