Hacker News new | ask | show | jobs
by enoptix 4431 days ago
So you have data that one day might go into an RDMS but the next day might go into file store? Do you constantly migrate old data between the two stores? What is the use case for this? Not being snarky, I'm genuinely curious. We use more than one datastore but that data usually stays put once its committed to one format. RDMS for most of the app, Redis for quick lists and cacheing. Flat files when necessary. But those models don't change their store ever unless its a major overhaul.
2 comments

It's a combination of 3 things. 1) Storing of the data is not what is central to my business case. It is an operational requirement, not what I'm selling. My major architecture requirements therefore do not get driven by what data store I'm using. 2) I frequently have data migrate from 1 format to another. 3) The actual data store formats don't change often, but each one of them has changed at least once in the last 3 years. That means that every six months we are migrating data store implementations. I don't want this to actually impact my business (see point 1) and therefore data store specifications are highly isolated from the other code.
When I worked at last.fm code that assumed all the data would always be in a single postgres database was a constant source of pain - we spent a lot of time migrating tables out of the big central database, either because the data simply didn't fit any more, or because we wanted it to be available to a Hadoop job. (There were probably other reasons, but those are the ones I remember). Maybe last.fm's an extreme case, but it does happen.
wouldn't a single abstraction over postgres, filesystem, hadoop, etc be either really leaky or really inefficient? different datastores are better suited for certain kinds of queries. It seems like the programmer should be aware of what he/she is querying.
You invert the dependency. The abstraction is over the things that the higher level code needs. I don't need to know about query types, indexes etc. I need some business answer (all log records between x/y, a user matching username x), I program to an interface that provides all the answers necessary for the high level code.

The implementation of that interface is data store aware and implements the interface in the most effective way possible for the data store holding the things I'm interested in.