Hacker News new | ask | show | jobs
by joshuaellinger 1885 days ago
I just implemented both a CSV parser and an address standardizer in numba (both CPU and GPU) running in parallel feed through a message queue with a bunch of workers subprocs.

It takes a bit of getting used to but the performance gains on impressive. Basically, my bottlenecks shift from compute to i/o.

I think you have to balance it against writing in C/C++. Mentally, it is basically the same work as writing in C (you manage memory/you write complicated for-loops) but you have good array support with numpy. The primary advantage for me that everything stays in the python runtime environment. You just run the code without any extra steps.

...

What is missing from the timing type 'toy' benchmarks is an understanding that there is typically more than one bottleneck in a real problem and it is easy to choose the wrong one to optimize and get little gains.

After starting C (30 years ago now), spending a long time in C#, then switching to Python a few years ago, I think the unappreciated advantage of python is that I have to abandon all pretense of caring about speed and just get stuff working. It basically solves the pre-mature optimization problem for me by being a fast interpreted language rather than a slow compiled language.

1 comments

> I think the unappreciated advantage of python is that I have to abandon all pretense of caring about speed and just get stuff working. It basically solves the pre-mature optimization problem for me

I feel the same way. With Python I just write the simplest algorithm that first comes to my mind, even though I know that it is not the most optimized way of doing things. But most of the time I am surprised that it works so fast that I realize I actually don't need to optimize it.

And being able to create and easily manipulate dictionaries and tuples also allows me to create efficient data structures very quickly.