| > 1. I did not do enough load-testing Load test constantly. My policy is to (almost) never develop using "sample data". Instead, I take a very large example of real world data (say 95th percentile of what is actually used in the wild) and develop with that as my backing data. If operations are slow enough for me to be annoyed in development, clearly they will be too slow for the (many more) people who have to work with the project once complete. > 2. Since this service is constantly updating, I frequently fumble with git. like accidentally pushing testing code/hardcoding onto prod. Lock the `main` branch, only allow commits to it from PR's. Review your own PR's. > 3. There are lots of flows in the service, so missing out on testing one of them. Does making a change in one flow tend to adversely affect seemingly unrelated others? That might be an engineering shortcoming you should address. Besides that, automated testing. Some stacks allow "recording" a flow, then automatically making sure that same flow can happen on every PR. See point 2. > 4. other notable issues like bad queries from analytics team There are no bad queries, only insufficient validation, timeouts, and/or load balancing. |
Interesting point. Will try to incorporate that.
> Does making a change in one flow tend to adversely affect seemingly unrelated others?
It doesn't happen that much, but because there is a lot of intersection between those flows, they are kind of interlinked(to reduce code duplication). But point noted, I will try to see if they can be separated.
> Lock the `main` branch, only allow commits to it from PR's. Review your own PR's.
Done.
> There are no bad queries, only insufficient validation and/or timeouts.
Validations are huge issue. When you have hundreds of variables and one of them throws DivisionByZero error or invalid data type, those are hard to catch
Loved these suggestions especially the first one. any more ideas?