|
Yeah, they're very much doing it. Pandas is huge, libraries like Spacy, NetworkX, etc exist. It's a massive and good ecosystem. Python is the goto for scientific computing in most of the sciences for newer students I'd hazard a guess over the older R and Julia. This will be blindingly obvious if you work in that area. Yes, you can do it in another language, but you're missing out on a lot of stuff that is just done and is state of the art and is fast because the speedy parts aren't in Python. The complaints about parens for lisp are superficial, but it's my experience the same same goes for whitespace in Python. They just don't matter. |
Having worked months with a slew of senior data scientists, this was a bit painful. Python is so slow and those data scientists were very good at coming up with solutions for the issues of the company, but the implementations (using Spacy, Pandas and other libs) had enough Python in them to make them not practical for the company use case. Nice prototypes which I then had to fix them or even rewrite to C/C++(we worked Rust as well to try it out) to make them usable in the company data pipeline.
I think companies are burning millions (billions in total?) on depressingly slow solutions in this space by throwing massive power at it all to make them complete their computations before the sun dies out.
Example: we needed a specific keyword extraction algorithm for multiple languages; my colleague used Spacy and Python to create it. It took a couple of seconds per page of text; we needed max a few ms on modern hardware. He spent quite a lot of time rewriting and changing it, but never got it under 1s per page on xlarge aws instances. My version takes a few ms on average executing the same algorithm but in optimised c/c++.
Sure we could've spun up a lot more instances, but my rewrite was far cheaper than that, even in the first month.