Hacker News new | ask | show | jobs
by scottLobster 986 days ago
Yeah, back in college I worked with a Biochemistry grad student on a group project that involved some coding (I was Computer Engineering). To iterate over a matrix, he used three nested loops with an if-statement to switch between rows and columns. Technically it worked but wildly inefficient, and he was proud of it...

To his credit once I (as nicely as possible) showed him how to do it with two nested for-loops he clearly felt stupid and conceded the point. He was otherwise a very smart guy and good to work with, but goes to show how we can take our training for granted. Even freshman-level stuff goes over the heads of PhDs, and I'm sure the same would be true if I were to drop into a biochem lab.

2 comments

Similar story - a PI had written some code to from (row, column) indices of the upper triangle of a matrix (made somewhat tricky by excluding the main diagonal) to a linear index. He used a for loop to start from the beginning and count up for an O(n^2) algorithm - I was able to give him an O(1) constant time formula to do the same thing for a rather dramatic speedup.
I ended up needing this so often for graph processing, and for values which might be inexact if using floating point, that I saved the formula in a blog post. https://vladfeinberg.com/2020/03/07/subset-isomorphism.html

The formula can be "oblivious" to the final size of the matrix too, which is helpful if you're doing some sparse ML training on edges (e.g., GNNs).

During my masters thesis in a chemistry lab, I got a side task to look at a data analysis script and make it run faster. It was a "C/C++" code (i.e. procedural C-style code using C++ stdlib for convenience) that read a file line by line and then fed it to a slow processing function, then aggregated the results. It took over a day to run.

Without even looking at the processing function, which I considered some sciency science, I set up pthreads and mutexes on the result array and such to reap almost perfectly linear scaling. So far, so good.

Then I ran a profiler to see what was actually taking so long.

... Uh, why are you spending all this time copying strings back and forth?

Turns out they passed all strings by value. Sprinkling in a few const & here and there got a 1000-fold speedup or such. I felt pretty stupid for my multithreading antics after that.