Hacker News new | ask | show | jobs
by tmountain 2613 days ago
When you're a small startup and you're just starting up, you can create a single MongoDB instance (ignore everything about you've heard about Web Scale) and stuff data into it as needed, without thinking much about the structure. You can add in contracts on your database functions, which slowly specify the contract, as you learn more about what your project is really about. To get a sense of that style of development, please see what I wrote in "How ignorant am I, and how do I formally specify that in my code?"

This strategy seems like it forgoes what I consider an important step in any project, which is, thinking critically about your data model and getting that right before you start building code on top of that structural foundation.

I could see doing what you're describing to build a prototype, which I would then extrapolate my learnings from, and subsequently toss out, but this seems like a dangerous way to get started with something that will end up in production (and potentially maintained for years to come), as it glosses over the importance of coming up with a really coherent data model, and let's face it, data is the heart and soul of most projects.

Am I wrong?

1 comments

"I could see doing what you're describing to build a prototype"

It's very much for prototypes, and especially greenfield projects. If I was, instead, doing something like building a new service, inside an Enterprise that was already using something like the unified log architecture that Jay Kreps has described, then I would certainly think hard about what the schema would be for the particular service I was building -- after all, in such situations you're never going to pull all of the data out of Kafka, so you automatically have to figure out what part of the data you want. LinkedIn currently stores 900 terabytes of data in its Kafka instance, and I'm unlikely to write a new service that actually needs all of the 900 terabytes of data. So merely by thinking about the question "What of this data do I need?" I'm already implicitly thinking about a schema.

Having said all of that, how often have you written a service where you got the schema 100% correct on your first try, and no further changes to the schema were needed. Possibly you are smarter than I am, but I personally have never done that. All of my first attempts need later adjustment.