Hacker News new | ask | show | jobs
by CE02 1201 days ago
One thing I’d add to this conversation, though I’m certain it’s already been stated: As many have mentioned, there is a large subset of the user base that uses Python for applied purposes in unrelated fields that couldn’t care less about more granular aspects of optimization. I work as a research assistant for international finance faculty and I would say that compared to the average Hackernews reader, I’m technologically illiterate, but compared to the average 60-80 y/o econ/finance faculty member, I’m practically a Turing award winner.

Most of these applied fields are using Python and R as no more than data gathering tools and fancy calculators. something for which the benefits of other languages are just not justified.

The absolute beauty of Python for what I do is that I can write code and hand it off to a first year with a semester of coding experience. Even if they couldn’t write it themselves, they can still understand what it does after a bit of study. Additionally, I can hand it off to 75 year old professors who still sends Fax memos to the federal reserve and they’ll achieve a degree of comprehension.

For these reasons, Python, although not perfect, has been so incredibly useful.

9 comments

I just want to add to this, I had this exact same experience when working with journalists and other non-technical background programmers.

You’ll find everyone from philosophy PhDs to Biologists to Journalists who use pandas because its so easy to learn it and work with it. It’s amazing how you can become productive in python/pandas without any experience or even basic understanding of programming because of how accessible jupyter, colab and blogs/docs on pandas are.

The other thing people don’t talk about is that a lot of these organizations can hire a CS student part time or a full time software engineer/data engineer/data scientist who can optimize their scripts once they are written. Pretty much any software engineer can read and debug python code without needing to learn python. So for example, I know some engineers working in genomics who have turned biologist-written scripts that take several days to run in python into scripts that take hours or minutes to run by doing basic optimizations like removing quadratic algorithms from the script or applying pyspark or dask to add parallelism.

The fact that python can be used as a bridge between technical and non-technical people is amazing and I think it has provided a better bridge between these groups than SQL was ever able to provide.

I couldn’t agree more. And I must say, now that it’s being used as a bridge between technical and nontechnical talent it’s becoming ever more vital from a career perspective. Most people recognize the value of fundamental coding skills and if you’re even just above average at coding in a non-CS field, you seem magnitudes more valuable than you really are. In both industry and research, ears immediately perk up when they realize I have a background in economics but competencies in coding beyond the standard regressions in R that everyone does in econometrics. It’s hilarious because as mentioned prior, I’m rather pathetic compared to most people on this forum.
Yeah, Python is widely used where I work for just that. The "hierarchy" of tools look somewhat like this - from most to least technical competent users

1) Languages like Python / R / Julia / etc. + SQL

2) PowerBI, Tableau, or similar tools

3) Excel

The number of users of those tools will be the inverse, with Excel being number 1.

If you're competent using the "stack" above, you could probably work as an analyst anywhere - given that you can pick up domain knowledge.

I hate to admit that I very often start the python repl to just do some simple calculations. I always have multiple terminals open so instead of opening a calculator I just use python in one of the terminals.
Agreed. Python's REPL has basically totally replaced my usage of Emacs calc as a desk calculator, mainly because it is always there and if I don't know the big-brain closed-form solution for something like compound interest, I can just write a loop and figure it out that way.
So what you are saying is that Python is Excel for programmers :D
This is a really good line, the VAST VAST majority of programming in the world is done in Excel by people who would be horrified if you told them they were programming.

And I wouldn't be surprised if a large number of python programmers would say they're not programming, it's just scripting.

I also use a python repl as an alternative to excel or SQL. I find myself just downloading the data as a CSV and then quickly cooking up some pandas to get a graph or aggregate some stats, it’s just so much quick easier imo.
I’ve migrated to the tidyverse for most of my EDA and plotting - I’ve found dplyr and ggplot to be noticeably more expressive. Pandas always added a ton of friction for me.

It’s still my choice for quick and non-graphical analysis when I’m on a remote.

An alternative to pandas/Python for similar uses is https://www.visidata.org/. You can use Python in it also.
A bit off topic, but what would you use for data "mangling"? Like joining csvs on complex conditions, cleaning tables etc. Pandas seems to be the wrong tool for this, but I still often find myself using it as in contrast to something like Excel, my steps are at least clearly documented for future use or verification.
If you asked this question 6 or 8 years ago the answer would be it depends on the volume of data (10s of gb, 100s of gb etc.) and I could give you just a single tool that would help you in most cases.

Today honestly most tools are pretty capable, pandas is a great choice and if you have really high volumes of data you might try koalas (spark) or polars.

Honestly the biggest design considerations for data science today are things things external to your project: what do you and others on your team know, what tools does your company already have setup, what volume of data are you processing, what are your SLAs, who or what else needs to run this script/workflow, what softwares do you need to integrate with, how often does it need to be processed, how are you going to assure the quality of your data and what tools are you using for reporting?

I tend to use pandas and SQLite for most use cases cause I can cook up a script in 2 hours and be done, I just code it interactively in a notebook and most people are able to work on a pandas or SQLite script productively if it needs to be maintained even if they don't know python. If its a large volume of data or a rapid schedule (minutes, seconds) or tight SLAs on quality or processing time, then I start to consider whether pyspark, Apache beam, dask or bigquery might be a good fit.

So it really just depends but for most people who are processing < 100 GB on a 1+ day schedule or ad hoc I would recommend just using pandas or tidyverse in R and getting really good at writing those scripts fast. Today you’ll get the most mileage out of those two tools.

I still use perl for some of that stuff, or even awk, but those are barely reusable or readable.
This is a letter to the general community: please stop writing these scripts in perl and bash one liners. That one off script you thought would only be used once or twice at this nonprofit has been in continuous use for 12 years and every year a biologist or journalist runs your script having no idea how it actually works. Eventually the script breaks after 8 years and some poor college student interns there and has to figure out how perl works, what your spaghetti is doing and eventually is tasked with rewriting it in python as an intern project (true story).
I think your complaint isn't really about perl and bash. It's about knowing your audience.

When writing code that will be used by a particular sort of user base, the code should be written in whatever way best suits that user base. If your users are academics, researchers, journalists, etc. -- yes, avoid anything with complex or obscure semantics like perl or bash.

But if your code is going to be used by programmers or people who are already comfortable with perl/bash/whatever, those tools may be just the ticket.

one line spaghetti ... I remain unsympathetic.
Do you reply on any GitHub repo or gist w/ code snippets?
> I very often start the python repl to just do some simple calculations.

If you use the python repl a lot and haven't heard of it, ptpython is worth checking out as a repl replacement. I find it to be much more ergonomic.

yup, from decimal import Decimal, and get better accuracy than any default calculator
You may like xonsh

https://xon.sh/

No need to fire up a python repl.

I don't see why that's something to be ashamed of. I frequently pop open a Ruby on Rails console for this purpose. (Basically ruby's repl + libraries and language extensions.)
Eh, I type basic operations in Spolight or Google, whichever is lying on my screen!
I have python on my phone and use it to calculate tips sometimes.
Have you tried ipython? Python repl on steriods!
from time to time yes. Ideally I would also have a jupyter notebook running at all times, but in the end it mostly comes down to vanilla python because that's installed on everything I am using
I do too if I already have a repl open, but otherwise I mostly use bc so I don’t have to wait for the slight lag of the repl to start
What’s to hate about that? It’s a perfectly good use of Python and I do it all the time.
I've seen this too. Python has supplanted what used to be done in a spreadsheet entirely, even the custom VBA macro stuff that was once a high level spreadsheet. Python with/plus viz is more enjoyable experience than trying to wrangle some general purpose spreadsheet into doing this stuff. And, it's relatively portable and transferrable which are major advantages of the spreadsheets.
I'm one of Python's biggest critics (to me it's a Monkey's Paw of software development), but I think this is exactly the appropriate situation to use it. It's great for one-off fancy calculations, system scripts, ideally with no dependencies and/or a short lifetime
> to me it's a Monkey's Paw of software development

This piqued my curiosity. I've worked with Python on and off for the last ~20 years, and while I'm not a fanboy or apologist, and use other tools when appropriate, there's also a reason it remains in my toolbox and sees regular use while many other tools have come/gone/been replaced by something better.

Can you share an example scenario where it's a Monkey's Paw? My suspicion is that this is more of an org issue than a tech issue?

Dependency management/tooling. Python (philosophically) treats the whole system as a dependency by default, in contrast with other modern languages that operate at the project/workspace level. This means it's very hard to isolate separate projects under the same system, or to reproducibly get a project running on a different system (even the same OS, because the system-wide state of one machine vs the next matters so much).

People work around these issues with various kludges like virtual environments, Docker (just ship the whole system!), and half a dozen different package managers, each with their own manifest format. But this is a problem that simply doesn't exist in Go, JavaScript, Rust, and others.

For code that never needs anything except the standard library, or for a script that never needs to be maintained or run on a different machine, Python is fine. Maybe even nice. But I've watched my coworkers waste so many hundreds of developer-hours just trying to wrangle their Python services into running locally, managing virtual environments, keeping them from trampling on each other's global dependencies, following setup docs that don't work consistently, and fixing deployments that fail every other week because the house is built on sand.

No.

Virtualenvs, and requirements are a thing in Python for ages.

I’ve used tons of languages and while not the best, Python dependency management and project isolation is decent. IMO certainly better than JavaScript.

It's decent if you've been in the loop enough to use it. It's not built-in. It's a good practice, for sure, but it not being built-in at the language level makes it insanely easy for a newcomer to just... Not use virtualenvs at all.

In contrast to Javascript/Node.js/NPM/Yarn/whatever-you-want-to-call-server-js, which maintains a local folder with dependencies for your project, instead of installing everything globally by default.

Heck, a virtual env is literally a bundled python version with the path variables overriden so that the global folder is actually a project folder, basically tricking Python into doing things The Correct Way.

Virtualenvs are a part of the standard library since v3.3[0] and most READMEs do reference them btw.

[0]: https://docs.python.org/3/library/venv.html

It's been said, quite correctly, that Python is the second best language for everything.

I feel that it has recently - like many really mature platforms - become very much like the elephant from that old apocryphal story [0]. It is being used for many different purposes, with very different requirements and needs, with users being so focused on their own use that anything outside that is considered "bloat" and "waste".

[0] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant

when it comes to slightly more non simple use cases involving parallelism and concurrency python and their imperative kin starts falling quite short of basic needs that are easily satisfied by

fp languages like

ocaml

haskell

racket

common lisp

erlang

elixir

or rust/golang

but even if the code is single threaded and not hampered by GIL limitations python tends to be super slow imho; also debugging dynamic python and imperative stateful python after a certain code base size >10k LOC gets extremely painful

A lot of these problem spaces can get away with single threaded performance because maybe they're generating a report or running an analysis once a day or at even slower frequency. I work in a field where numerical correctness and readability is important for prototyping control algorithms (I work on advanced sensors) and python satisfies for those properties for our analysis and prototyping work.

When we really want or need performance we rewrite the slow part in C++ and use pybind to call into it. For all the real implementations that run in a soft real time system, everything is done in C++ or C depending on the ecosystem.

debugging dynamic python and imperative stateful python after a certain code base size >10k LOC gets extremely painful

for any meaningful scale you are better served by basic FP hygiene as evidenced in

haskell

elixir

CL/racket

or rust/golang

Because you say it doesn't make it true. It's not that painful or painful at all really. Good abstractions and planning make writing and maintaining a python easy, just like in any language.
I don’t get it. Go is as imperative as a language can be.
go is imperative but there are functional elegant styles borrowed from otp/erlang in ergo https://github.com/ergo-services/ergo https://memo.barrucadu.co.uk/three-months-of-go.html
Common Lisp, paragon of FP:

  (loop for x across numbers
        when (evenp x)
          do (setf result (+ result x)))
I mean yeah, you can do FP in CL, but it allows you to program in any paradigm which you prefer.
I agree. But most people just need a pick up truck, not forming railway consists.
Python is ideal for the non-professional programmer who wants to put their skills and knowledge on wheels.
>As many have mentioned, there is a large subset of the user base that uses Python for applied purposes in unrelated fields that couldn’t care less about more granular aspects of optimization.

Nobody cares about this that much. Even a straight up software developer in python doesn't care. The interpreter is so slow that most optimization tricks are irrelevant to the overall bottleneck. Really optimizing python involves the FFI and using C or C++, which is a whole different ball game.

For the average python developer (not a data scientist) most frameworks have already done this for you.