Hacker News new | ask | show | jobs
by kedean 3596 days ago
Having worked on that sort of process many times, I'm of the opinion that a message queue is the ideal solution there, not a database. If you're storing the data for the purpose of processing it again later, it should probably be ephemeral and fast, rather than long-lived and flexible.
1 comments

That doesn't work for us (beyond the first stage), because the fields we extract from the original source are not ephemeral.

We need to store the key/value pairs and explore them in a reasonably productive fashion (i.e using queries) in order to come up with machine learning algorithms. And any new algorithms we write need access to all historical data.