| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Jefff8 724 days ago

I've done this sort of thing a few times.

- Build a small core team who know the problem in depth. You really need to understand v1 and v2 data and the mappings, as well as the functionality in each.

- Build a test system that is insulated from customers; you want to be able to use this as if it's the real thing, but to be absolutely, completely, dead certain that it will not affect live systems, and that no output from this will reach people it should not. Make sure there are visual indicators as well as logical traps on data exiting this system. Make this repeatable - you are going to use this system a lot to re-run tests. Despite the firebreaks, you will have some brown pants moments.

- The ideal is to move from v1 to v2 gradually, using a passthrough system. However, ime, this is often not possible, and at some point there will be a hard switch between systems.

- Develop migration plans with multiple off-ramps and fallbacks, and monitoring. By the time you press the switch you should know exactly how everything will work, and you should have no issues, despite this, you should have layers of contingency to allow for business as usual when the unplanned happens. This is a mix between technical and business and it should have been properly understood by everyone involved. Monitoring is critically important. Your plans should include aftercare... for example, what happens if you think the migration is successful, and two weeks later you discover an issue with 200,000 transactions. How will you reconcile? How will you communicate with the affected parties?

- Look for classes of things... Can you find 100,000 dead accounts that can be removed? Can you find 500,000 that have only ever had one transaction? Look for classes of errors - and fix them before migration. Keep a record of all of this, and make certain that you have covered all cases and all records. If you are lucky, you will be able to migrate classes from v1 to v2 and have the passthrough transparently manage this.

- Ideally, have a log of transactions that can be replayed on demand. So that you can run systems in parallel and so that in the event of issues, you can unwind.

- Keep written logs of all the things that you and the team do. You _will_ forget stuff you've done. This is true on an hour to hour basis as well as a month-to-month basis.

- Work on making migrations fast. Can you organise it so that 16m migrations take 10 minutes? This allows you test and retest. You want anyone to be able to run on-demand migrations.

- Look to the end-users. For an upgrade to be successfully deployed, both the business and the end-users must be happy; you will want to run test groups, pilots, group conversations, and make your team available to the end-users. Nothing should be surprising by the end. You will also need to know that v2 is performant - catch these problems before they become general issues of dissatisfaction. Look also for pain points and try to ensure that you remove them in v2. Change is painful, but if you can show that there are benefits you will ameliorate much of the criticism.

- Have defined end-points. You do not want to be doing this in 5 years time.