Hacker News new | ask | show | jobs
by VincentEvans 1273 days ago
I sneer at the emphasis on β€œ1.4 billion records!” in the article as if it’s a lot.

At a recent place of employment I created and was responsible for a database that had about that many records and in actuality was a single 2tb postgres db and completely unremarkable.

I never claimed to have worked with big data.

4 comments

It's not really the quantity of data that is important in migrations like this.

It's what is and isn't in the data - often a lot of junk in my experience if the source system is a legacy system that has evolved

what meaning that data has within a completely different system

what the demands are on the completeness of that data is in the new system

how to deal with exceptions

and whether that data can ever be frozen, or whether it is still online (as in the case of banking transactions)

This is unlikely to be simply a technical problem of ETLing tables, changing date ranges from inclusive to exclusive and mapping some address fields.

Of course the size of the data after a certain point does make a big difference to risk planning and business continuity planning. It's not possible to rollback and try again within the migration window should a catastrophic issue occur, and it's not possible to simply run some bulk updates to fix issues during the go-live validation.

It is noted though in this project that the data migration itself was not found to contribute to the failure.

What were your latency requirements on pulling a record out and how complex were the joins to pull said records?

If you have a simple db structure with a few tables and very clear data/index rules then billions and billions of records is pretty easy. Your indexes cut out 99% of the work and everything runs smooth and efficient.

But then you can have eldritch horrors where your stored procedures look like seedy detective novels where you chase join after join and have scary high memory requirements on execution.

When you say "emphasis on" you mean the single mention with no distinction between that stat and the any other of the facts of the case? I don't think they really put any weight on if that's a lot or not.

Personally I would say it is a lot. No other number mentioned in the article even approaches a billion. Billions are big. It might be not be true big data big, but it is still a lot of customer records for a migration project (depending on what exactly they were trying to do within the migration) and it does illustrate why they had so many issues, because there was a lot to deal with.

Just wondering if the migration disaster at this scale can be avoided using modern cluster and orchestration technology like Kubernetes?
No technology can compensate poor planning and technical incompetence. From all I read that was the root cause of the problem. So the same people and processes using Kubernetes: No.

(Of course this is just speculation. I have no insider knowledge.)

I think downvotes are unnecessary and this is a finely crafted joke.
Without even having got around to reading the whole report yet, I can promise you that a f*ckup on this scale cannot be avoided solely through technology decisions. The problem was (is always) with the people and the structures they were working in.
To the contrary, switching to a new technology is a favorite reductive excuse of poor management. They choose one early technical decision and try to hang all the failure on that.

As a sibling comment states a screw up of this magnitude is never simply a technology issue β€” it requires bad management at many levels.

Kubernetes are to help you scale. They do not fix one's incompetence. They increase complexity of the stack and if anything would make it even worse for the incompetents.
k8s does absolutely jack shit when it comes to data migration so not really.

Still need to write all the procedures, test it, then do it on live system again.

It might make prototyping easier (...or harder) but that's about it