Hacker News new | ask | show | jobs
by bbulkow 1689 days ago
Another way to analyze the problem: what other language would it have been, given the moment ml hit?

You say compared to other scripting languages'. Let's list them.

Ruby: no numeric support Go: unnecessary typing, modest numeric support, shitty generics Bash: ha ha ha Scala, java, c, cpp: not a scripting language, complex Tcl, php: out of favor Rust: hadn't happened yet R: in memory bias, not as simple Other languages were obscure or owned by monoliths (kotlin, swift, c#)

Python also has multiple implementations, a minor thing, but not really. Pypy keeps cython on its toes.

C# really could be a contender. I am more productive in c# than any other language except python (although I think I will be more productive in rust)

Python is, almost unarguably, the easiest language to code in, right now, period. It has the greatest expressiveness and the simplest syntax. I use it for large scale open source art projects, and you can use it for ai.

Why are you asking?

4 comments

R is definitely a good language for quant work. In some ways it could have been the natural choice, and there are still places where it's a better choice than Python.

It's just far to fragmented, only really good for numeric work (and thus harder to integrate with production systems), and full of the weirdest gotchas.

https://www.burns-stat.com/pages/Tutor/R_inferno.pdf

R has a few advantages over Python for data science work, but Python has a big one: it's also widely used by software engineers who are not data scientists.

I found it easy to jump from the software side of things to the data side of things because I already knew the quirks and tricks of Python. Having to learn a new language would have made this transition harder.

I left out JavaScript. It is not a simple language, with it's service heritage. It is bound up in the node runtime in a way that doesn't really work right for data processing.
lol, you forgot perl too.
Yeah and what about awk!?

Partially joking, but not totally, and I can appreciate why people might say this: https://news.ycombinator.com/item?id=5725291

(I pasted the HN link because the original seems to be down)

Non ML/AI coder here:

Why does ML/AI work need to be written in a scripting language?

Why can’t it be something like C++ etc instead?

One reason is that you usually need to try a lot of things before you get something to work. Language productivity is at a premium. You really want some sort of interactive shell where you can do calculations and pull up plots etc. This used to be done with IPython, which evolved into Jupyter.
It doesn't need to / you could theoretically do it in C++. It's just that Python (as with other scripting languages) provides really nice, high-level expressiveness and also has a decent module system. You can write code in the REPL or just write a quick-and-dirty script and test it out without write-compile-run cycles.

NumPy is highly optimized for things like matrix math. You get great speed with the C-level module, and you drive it with really simple Python code. So you want to multiply two matrices? The code literally looks no different than multiplying two scalars. That's Python's superpower.

I haven't written C++ in nearly 20 years; maybe it's good enough to be able to do ML work. But the heavy lifting library in C/C++ plus the high-level driving Python is a really good fit.

Well, it is slightly different than multiplying two scalars:

c = a * b

vs

C = A @ B

As another commenter said, speed of experimentation is an important factor. Also, dynamic types are nice when you're dealing with exploratory data work. Combine that with the library ecosystem and Python's ease of use, and there you have it.
Just one reason is that some Python libraries (numpy, tensorflow, pytorch) allow you to work with high dimensional arrays (3-4 dimensions) without for loops.

ML also needs reverse autodifferentiation, which would be a real pain in C++.

APL/array languages.