| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by threeseed 1688 days ago

I would put it mostly down to Spark.

Originally, it was only available in Scala/Java but then they added Python support courtesy of Py4J. And since Python was massively simpler than Scala it exploded in popularity very quickly becoming the default language.

So then you had Data Scientists who were already writing a lot of data transformations in Spark looking around at the rest of the Python ecosystem finding libraries like pandas, IDEs like Jupyter and basically staying there since it was so much easier than alternatives.

Their interests aren't really in computer science and so they look for whatever language can get them to an outcome as quickly and easily as possible. Even if it's not the most optimal, elegant or maintainable.

3 comments

pedrosorio 1688 days ago

Spark started getting industry adoption in ~2013-2014 when it became an Apache project.

The roots of Python as a language used for numerical/scientific/data science use cases are much older than that with numpy and spicy back in the 90s, early 2000 followed by pandas and scikit-learn in the late 2000s.

link

kragen 1688 days ago

By the time SPARK was born (02010?) Python had already eclipsed the non-JS alternatives (Scheme, Perl, Tcl, Ruby, awk, BASIC, Lush). I'd put the crossover point around 02002. IPython notebooks and pandas came even later than SPARK.

link

rcarmo 1688 days ago

PySpark certainly helped kill Hadoop for data munging, but I would say it only really got going in 2014.

link