What I have found is that even among talented senior engineers there is massive Dunning-Kruger effect when it comes to performant architecture. They don't know how to do it, and they don't know that they don't know how to do it. I have always wanted reasonable performance (though this
might appear like “performance programming” to a concerning
proportion of the software industry),
This hit me right in the heart.I'm often the only person on the team who cares about performance, so I am drawn to these performance-related challenges... and it has really hurt my career, because I am then often perceived as some kind of person focused on optimization rather than delivering features. Like the author I do not focus on performance for performance's sake. Nearly all of the time the right course of action is "do not optimize this piece of code; focus on maintainability and readability instead." However, sometimes, you really need to design for performance from the beginning.... or you don't have a product. At my most recent job they were trying to push lots of data through RabbitMQ/Celery for scientific work. This worked for trivial jobs (tens of megabytes) but not for moderate or large ones (hundreds of gigabytes) To make such a product viable, you really need to consider performance from the start. Celery explicitly tells you not to pass non-trivial amounts of data around: you should be passing pointers to data (database ID's, file paths, S3 URIs, whatever) rather than the actual full-fat data. The team really struggled with this. Their next proposed solution was "well, okay, we'll store intermediate results in the database and 'optimize' later" Great idea, but this involved 1B+ result objects. Wrong again. You are not serializing 1B+ Python objects, sending them over the wire, and performing 1B+ Redis or Postgres inserts in any reasonable amount of time or memory. Optimize and bulk insert all you want, but that's an absolute dead end. There aren't a whole lot of options for performantly slinging around hundreds of gigabytes of data. Assuming you can't just run on a monster server with hundreds of GB of RAM (which honestly is often the right answer) you are generally going to be looking at fast on-disk formats like Parquet etc. In any event that's something you really need to design around from the start, not something you sprinkle on at the end. They're on their second iteration of the architecture right now, and it's slower than the first iteration was. Still no viable product. Shame. |
Pro tip: always turn that kind of thing into a dollar value you can put on your annual review. "My update to X let us use Y fewer EC2 instances, saving us $Z per year." Then it's not some Don Quixote obsession, but a clear fiscal benefit to the company.