Hacker News new | ask | show | jobs
by stingraycharles 1360 days ago
But if you’re not storing data as JSON, can you really say you’re agile? /s
1 comments

Look, we'll just get it in this way for now, once it's live we'll have all the time we need to change the schema in the background
We don’t have a use case yet, but let’s just collect all the data and figure out what to do with it later!

It’s funny how these cliches repeat everywhere in the industry, and it’s almost impossible for people to figure this out beforehand. It seems like everyone needs to deal with data lakes (at scale) at least once in their life before they truly appreciate the costs of the flexibility they offer.

The Data Exhaust approach is simultaneously bad and justifiable. You should measure what matters and think about what you want to measure and why before collecting data. On the other hand, collecting data in case what you want to measure changes later is a usually lowish cost way of maybe having the right data in advance later.
Oh I agree, that's why I was careful to put "at scale" in there -- these types of approaches are typically good when you're still trying to understand your problem domain, and have not yet hit production scale.

But I've met many a customer that's spending 7-figures on a yearly basis on data that they have yet to extract value from. The rationale is typically "we don't know yet what parameters are important to the model we come up with later", but even then, you could do better than store everything in plaintext JSON on S3.