Hacker News new | ask | show | jobs
by tmpz22 2333 days ago
How does this work with container/ephemeral services such as typical K8s deployments? Can I trust the file system mounting via resources like StatefulSets or FS mounts? For that matter, Heroku, App Engine, Cloud Functions, whatever?

Our current setup is having all our services in kubernetes but our databases in stateful VMs. I do occasionally stuff job-reports and similar data into postgres rows since it's already there, but I've been unhappy with our ETL setup and would be interested in hearing techniques to improve it.

3 comments

ETL workers themselves are typically ephemeral, plumbing batches between remote storage systems like Postgres, S3, and Hive. You might use local disk as scratch space during the batch, but not as a sink.
From experience, and supported by the Sqlite docs, I can tell you that trying to run sqlite on files on an NFS mounted filesystem will not work. See section 2.1 of this document [1] and the related discussion HN discussion [2]

[1] https://www.sqlite.org/howtocorrupt.html

[2] https://news.ycombinator.com/item?id=22098832

Use a volume container mounted against a persistent storage engine on the node and do pod mounting from those containers. Stateful VMs are often a better choice for production imo.

I'm in favor of leveraging ISP dbaas and persistence offerings over trying to home grow something. It just depends on where you are coming from and/or what you are trying to do... K8s alone avoids so much lock in, and as long as whatever storage option (container mount) or dbaas you use is portable, I don't think it's so bad in either case.