I hate to admit that I very often start the python repl to just do some simple calculations. I always have multiple terminals open so instead of opening a calculator I just use python in one of the terminals.
Agreed. Python's REPL has basically totally replaced my usage of Emacs calc as a desk calculator, mainly because it is always there and if I don't know the big-brain closed-form solution for something like compound interest, I can just write a loop and figure it out that way.
This is a really good line, the VAST VAST majority of programming in the world is done in Excel by people who would be horrified if you told them they were programming.
And I wouldn't be surprised if a large number of python programmers would say they're not programming, it's just scripting.
I also use a python repl as an alternative to excel or SQL. I find myself just downloading the data as a CSV and then quickly cooking up some pandas to get a graph or aggregate some stats, it’s just so much quick easier imo.
I’ve migrated to the tidyverse for most of my EDA and plotting - I’ve found dplyr and ggplot to be noticeably more expressive. Pandas always added a ton of friction for me.
It’s still my choice for quick and non-graphical analysis when I’m on a remote.
A bit off topic, but what would you use for data "mangling"? Like joining csvs on complex conditions, cleaning tables etc. Pandas seems to be the wrong tool for this, but I still often find myself using it as in contrast to something like Excel, my steps are at least clearly documented for future use or verification.
If you asked this question 6 or 8 years ago the answer would be it depends on the volume of data (10s of gb, 100s of gb etc.) and I could give you just a single tool that would help you in most cases.
Today honestly most tools are pretty capable, pandas is a great choice and if you have really high volumes of data you might try koalas (spark) or polars.
Honestly the biggest design considerations for data science today are things things external to your project: what do you and others on your team know, what tools does your company already have setup, what volume of data are you processing, what are your SLAs, who or what else needs to run this script/workflow, what softwares do you need to integrate with, how often does it need to be processed, how are you going to assure the quality of your data and what tools are you using for reporting?
I tend to use pandas and SQLite for most use cases cause I can cook up a script in 2 hours and be done, I just code it interactively in a notebook and most people are able to work on a pandas or SQLite script productively if it needs to be maintained even if they don't know python. If its a large volume of data or a rapid schedule (minutes, seconds) or tight SLAs on quality or processing time, then I start to consider whether pyspark, Apache beam, dask or bigquery might be a good fit.
So it really just depends but for most people who are processing < 100 GB on a 1+ day schedule or ad hoc I would recommend just using pandas or tidyverse in R and getting really good at writing those scripts fast. Today you’ll get the most mileage out of those two tools.
This is a letter to the general community: please stop writing these scripts in perl and bash one liners. That one off script you thought would only be used once or twice at this nonprofit has been in continuous use for 12 years and every year a biologist or journalist runs your script having no idea how it actually works. Eventually the script breaks after 8 years and some poor college student interns there and has to figure out how perl works, what your spaghetti is doing and eventually is tasked with rewriting it in python as an intern project (true story).
I think your complaint isn't really about perl and bash. It's about knowing your audience.
When writing code that will be used by a particular sort of user base, the code should be written in whatever way best suits that user base. If your users are academics, researchers, journalists, etc. -- yes, avoid anything with complex or obscure semantics like perl or bash.
But if your code is going to be used by programmers or people who are already comfortable with perl/bash/whatever, those tools may be just the ticket.
He has a valid point, though. I've seen (and written!) one-liners that were so complex that nobody, even devs, can deal with them without decoding them first.
They aren't technically "spaghetti", but they are technically impenetrable.
I argue that one-liners like that aren't good for anybody, dev or otherwise.
I don't see why that's something to be ashamed of. I frequently pop open a Ruby on Rails console for this purpose. (Basically ruby's repl + libraries and language extensions.)
from time to time yes. Ideally I would also have a jupyter notebook running at all times, but in the end it mostly comes down to vanilla python because that's installed on everything I am using