|
|
|
|
|
by bunderbunder
701 days ago
|
|
Push less data through wires. The memory hierarchy is so stark on modern hardware that the 30-year-old adage that "the fastest code is the code you don't run" is maybe less important than, "the fastest code is the code that doesn't spend much time talking to the memory controller." And it's even worse once we start talking about accessing memory that's on an entirely different computer. Serialization/deserialization, IPC, network calls, and all those other things we do with reckless abandon in modern service-oriented and distributed applications are just unbelievably expensive. Last year I took a slow heavily parallelized batch job and improved its throughput by 60% by getting rid of both scale-out and multithreading and just taking it all down to a single thread. Everyone expected it to be slower because we were using a small fraction as many CPU cores, but in truth it was faster because the time savings from having fewer memory fences, less data copying, and less network I/O was just that great. And then the performance gains kept coming because, having simplified things to that extent, I was then in a much better position to judiciously re-introduce parallelism in ways ways that weren't so wasteful. |
|
Another common one is just poor query performance from a database. Lack of appropriate indexes, or other relatively easy optimizations.
Similarly, finding a method of caching that's just bad (in-memory database, with sql queries instead of a dictionary). Isn't so bad for one call, very bad when a a given request (login) makes over 200 calls to this cache for configuration settings. It wasn't a problem per request but in aggregate.