Hacker News new | ask | show | jobs
by weberc2 2339 days ago
No, data science is typically cpu expensive. Python is fundamentally single threaded and slow at that, so you have to be very clever to work around the issue (e.g., running a separately scalable service for your data crunching work). Contrast that with Go where the runtime can use other cores.
3 comments

>Python is fundamentally single threaded and slow

Python is not fundamentally single threaded - it just has a lock that stops it from taking advantage of threads in cpu bound scenarios.

Python is used in data science because of the C bindings that make it not slow. Also, when in C, you can take advantage of threads since they live outside the GIL. e.g. Dask.

> Python is not fundamentally single threaded - it just has a lock that stops it from taking advantage of threads in cpu bound scenarios.

Tomato tomahto

> Python is used in data science because of the C bindings that make it not slow. Also, when in C, you can take advantage of threads since they live outside the GIL. e.g. Dask.

Correct. Python is fast when you aren’t running Python. Of course using C (or anything else) only works in certain situations—there is a cost to crossing the language boundary and very often that cost is greater than what you save by using C. Never mind the added build/package complexity, the security issues, the maintainability issues, etc.

Python is a neat language, but it’s really expensive if your project ever might have tight performance requirements (where “tight” is laughably easy for most other languages). Python can often be made to meet them with enough shenanigans, it’s just costly to implement and maintain said shenanigans.

> Python is not fundamentally single threaded

Yes it is. It is not designed to run fast on multi-core CPUs, because there were mostly single-core CPUs when Guido made the language. It has been always a problem since multi-core CPUs are more frequent and it's a front where Python is losing the battle (against Go for example because of way easier concurrency support).

A program that uses subprocess would not have the single core constraint
No, but very often those programs are slower overall because of the pickling cost. Multiprocessing isn’t a magic bullet / there’s a reason threads exist.
Case in point.
If the language makes you jump through unnecessary hoops to get passable performance, the issue is with the language, not the program. Otherwise we could generalize your perspective such that all languages are above criticism (no languages have problems, users just fail to find and implement the proper hacks).