|
|
|
|
|
by idworks1
1752 days ago
|
|
One of my proudest moment in my career is when I lowered our app processing time from ~8hrs to 17 minutes. When I deployed my first update, it reduced it to 2 hours. The sysadmin immediately contacted me that there was something unusual. I confirmed the results but he was skeptical. Then with my second update, he told me that the app must be broken or that the script must be dying. There is no way it could complete this fast. What was the issue? We processed terabytes of data. Each and every single line processed created a new connection to the database and left it hanging. A try catch was added when the connections failed and restarted the process. Removing the connection from the for loop and properly handling it reduced the time drastically. And... why would you loop through millions of records when you can use batches? Also this was a phperlbashton* script. I turned it into a single PHP script and called it a day. As a consequence, backup time was reduced to 2 hours as opposed to 12 hours (no one was allowed on the website until the back up was done). Modern machines are incredibly fast. * PHP/Perl/Bash/Python |
|
This was for a genomics project and they ran it on a supercomputer. When I looked into it, they were reading the entire input into a giant array before doing one pass and dumping the result out to disk. I made a tiny change (it was a Perl script) to make it stream the I/O instead.
This is the most extreme example I've come across of people using computing power just because it's there. Nobody questioned why the script took so long to run because the data really was in the TBs and other stuff also took that long to run. Waiting a day for the results was considered normal. I see the same thing on desktop apps etc., on a much smaller scale, of course. When I run an electron app it takes several hundred milleseconds to do anything at all. But nobody questions whether it should because everything takes several hundred milliseconds.