Is swapping out the database layer so common you really need a complicated abstraction like the Repository model he mentioned? We've been running an app for 3 years and have never dropped ActiveRecord or Postgres.
Spinning this around on you: I worked on a project where swapping data stores around wasn't common because of tight coupling with AR. We considered swapping data stores around for different and evolving use cases and never did largely because we had so much code that we felt was tied up so tightly with AR that it would have been very expensive to swap. I think it definitely decreases flexibility, but that it's pretty hard to know if you want more flexibility ahead of time.
Pretty much all of these pattern discussions seem to be this way to me - "just do it the simple way! YAGNI!" versus "crap this one time I did need it and it was difficult to change by then! Maybe I should design things more flexibly from the start next time!". It's pretty easy to get burned going either direction, and depends a lot on things like what the project is, what organization is building it, and the level of success it ends up having. The closer a project is to a simple-CRUD, small team/unproven-company, prototype with limited success, the more sense YAGNI makes, and the further from each of those criteria a project is, the more it makes sense to design for more flexibility.
> It's pretty easy to get burned going either direction
Quite true, though I'd argue that YAGNI is still true as a probabilistic maxim. You'll make the "will I need it" decision many thousands of times in your career. If you follow YAGNI consistently[1], it will help you more often than it hurts, and you'll come out ahead in the long run.
[1] But nobody is saying you should ignore concrete evidence that you will need something later. That's its own cargo cult. If there's good reason to believe YAGNI doesn't apply in a particular case, don't follow it in that case.
I think this is a dangerous line of thinking, but I suppose I wouldn't modify it very much. What I would say is that YAGNI should perhaps be weighted higher, but that the probability of it being wrong in particular cases should be considered carefully.
With the "complicated" repository model mentioned you can transparently introduce other behavior, like a caching proxy, retry-on-fail proxy, migrate-on-write proxy, whatever. It might not be a valid use case for you, but I have seen tangible non-testing benefits of using the repository pattern.
I tend to not work on applications that work with a single data store. Some data will be stored on the file system, some in a traditional RDMS and some in a NoSql implementation. What data goes where, frequently changes and isn't really a major concern of the system, at least not the parts that need specification verification.
So you have data that one day might go into an RDMS but the next day might go into file store? Do you constantly migrate old data between the two stores? What is the use case for this? Not being snarky, I'm genuinely curious. We use more than one datastore but that data usually stays put once its committed to one format. RDMS for most of the app, Redis for quick lists and cacheing. Flat files when necessary. But those models don't change their store ever unless its a major overhaul.
It's a combination of 3 things.
1) Storing of the data is not what is central to my business case. It is an operational requirement, not what I'm selling. My major architecture requirements therefore do not get driven by what data store I'm using.
2) I frequently have data migrate from 1 format to another.
3) The actual data store formats don't change often, but each one of them has changed at least once in the last 3 years. That means that every six months we are migrating data store implementations. I don't want this to actually impact my business (see point 1) and therefore data store specifications are highly isolated from the other code.
When I worked at last.fm code that assumed all the data would always be in a single postgres database was a constant source of pain - we spent a lot of time migrating tables out of the big central database, either because the data simply didn't fit any more, or because we wanted it to be available to a Hadoop job. (There were probably other reasons, but those are the ones I remember). Maybe last.fm's an extreme case, but it does happen.
wouldn't a single abstraction over postgres, filesystem, hadoop, etc be either really leaky or really inefficient? different datastores are better suited for certain kinds of queries. It seems like the programmer should be aware of what he/she is querying.
You invert the dependency. The abstraction is over the things that the higher level code needs. I don't need to know about query types, indexes etc. I need some business answer (all log records between x/y, a user matching username x), I program to an interface that provides all the answers necessary for the high level code.
The implementation of that interface is data store aware and implements the interface in the most effective way possible for the data store holding the things I'm interested in.
Yes, if your building a user deployable product like TeamCity, Crucible, or other app where it might be deployed on a number of database tiers to fit your customer. In that case, a Repo abstraction (or ORM) makes life livable.
Pretty much all of these pattern discussions seem to be this way to me - "just do it the simple way! YAGNI!" versus "crap this one time I did need it and it was difficult to change by then! Maybe I should design things more flexibly from the start next time!". It's pretty easy to get burned going either direction, and depends a lot on things like what the project is, what organization is building it, and the level of success it ends up having. The closer a project is to a simple-CRUD, small team/unproven-company, prototype with limited success, the more sense YAGNI makes, and the further from each of those criteria a project is, the more it makes sense to design for more flexibility.