Hacker News new | ask | show | jobs
by ringmaster 4596 days ago
I was on a team that built a web app for primary school standardized testing. The amount of data presented and collected per student per test is large and perfect for a document store. MapReduce operations allow the app to quickly produce cacheable reports across cross-sections based on requested criteria.

Event the tests themselves are composed of multiple parts that randomize for each student, and lend themselves to the document structure that MongoDB provides. Individualized tests could be assembled from components based on student criteria and stored uniquely for a user as of that time, a thing which would be unnecessarily complex within a relational system.

Could this all have been done with a relational database? Yes, I suppose, but I cringe at the complexity of relating test questions with test answers with users with other data elements ad infinitum using JOINs on both read and write. And this doesn't even touch the topics of sharding and replication, which Mongo made easy in comparison to MySQL or MSSQL.

Choosing MongoDB was the correct decision for this dataset and application. I don't advocate it for every app, but for this one, it was the appropriate fit.

1 comments

How would you go getting out data that answered a question like "give me the average score for all maths questions of female students aged 6-7"?
The aggregation framework is ideal for answering these kinds of issues. Mongo has a bunch of aggregation routines that are useful for producing reports on demand, but not on the fly. The trade-off is possible because we know that the collected data (test answers) won't change after it is finalized, and the output of any individual report can be cached virtually indefinitely. (See http://docs.mongodb.org/manual/core/aggregation/)

Also keep in mind that unlike something like a web analytics package that gives you the option to filter and sort your data on any combination of criteria imaginable (for no good reason), the questions that academics/educators tend to need the answers to are generally the same for every new set of tests.

In other words, it's not necessary to enable every possible combination, filter, and sort of output, but merely (ha!) to optimize for the specific results that we know we will need (with a nod toward those results we might expect people to want in the future), and to codify the formulae that will produce those results.

Working with not a school but a testing research company (think of a company like "College Board" vs a "Smallville School District") leads you to produce reports that are significantly more detailed and statistically more valuable than "average score" questions like this, all of which is possible within MongoDB. Though, obviously this treads a little more into work product than would be comfortable to expound upon here. ;)

Sounds reasonable but not ideal. Probably some sort of ETL into a data warehouse would be needed for adhoc analysis.
That's what the aggregation framework is great for. I don't have the code handy, but the free MongoDB for Developers course covers almost this exact use case.