|
|
|
|
|
by VincentEvans
1273 days ago
|
|
I sneer at the emphasis on β1.4 billion records!β in the article as if itβs a lot. At a recent place of employment I created and was responsible for a database that had about that many records and in actuality was a single 2tb postgres db and completely unremarkable. I never claimed to have worked with big data. |
|
It's what is and isn't in the data - often a lot of junk in my experience if the source system is a legacy system that has evolved
what meaning that data has within a completely different system
what the demands are on the completeness of that data is in the new system
how to deal with exceptions
and whether that data can ever be frozen, or whether it is still online (as in the case of banking transactions)
This is unlikely to be simply a technical problem of ETLing tables, changing date ranges from inclusive to exclusive and mapping some address fields.
Of course the size of the data after a certain point does make a big difference to risk planning and business continuity planning. It's not possible to rollback and try again within the migration window should a catastrophic issue occur, and it's not possible to simply run some bulk updates to fix issues during the go-live validation.
It is noted though in this project that the data migration itself was not found to contribute to the failure.