Hacker News new | ask | show | jobs
by Raminj95 1033 days ago
Is there some more examples or blog posts that talk about this? I find the idea interesting and possibly very applicable in my work but just from this post alone I don’t feel like I have grasped it well enough to implement this.
1 comments

apologies for not getting into more detail--wanted to start by covering things at a high level. There are a few key concepts that might be helpful. * data state - this is contents of both your data and metadata at a given point in time. if your data doesn't fit into a single database, this can be difficult to manage. We use this technology to help us: https://lakefs.io/ * logical state - this is everything you use in processing the data in your pipeline (i.e. code, config, info for connecting external services, etc.). This can all reside in git

We found the key was associating our logical state (git branch) with our logical state (lakefs branch). We make this association during our branch deployment process.

Let me know if this helps at all. I was planning to write a follow up post about what we learned about managing the logical state of a data pipeline. If you have suggestions for a different topic to dive into, I'd love to hear about it.