Hacker News new | ask | show | jobs
by tomkwong 1504 days ago
First, I want to say that this is a great post. You always grow stronger when you make mistakes. Writing it up solidify understanding in the learning process.

This story resonates with many people here because many experienced engineers had done something similar before. For me, destructive batch operations like this would be two distinct steps:

1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

Production operations are always risky. A good practice is to always prepare an execution plan with detailed steps, a validation plan, and a rollback plan. And, review the plan with peers before the operation.

2 comments

> 1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

> These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

This is the most underrated comment.

I'm saying it as someone who had the ultimate oversight of deleting hundreds of TBs per day spread of billions of files on different clouds and local storage.

I've never regretted treating tasks like this as a pipeline of discrete steps with explicit outputs and inputs. Sending output to a file, viewing it, then having something process the file is such a great safety net.