Hacker News new | ask | show | jobs
by teekert 3284 days ago
I also learned to code after 30. At some point Excel and Origin weren't dealing well with ever increasing data sizes in my field (biology). I did an intro course on Python (2) of 3 days (basic Python and some Numpy). Back on the job I immediatly switched to Python 3, learned about Jupyter and was lucky enough to have a job where I could take time to learn (although it doesn't take much time to get back up to Excel/Origin level data analysis skills with Pandas/Seaborn/Jupyter!).

That combination is still gold for me although bioinformatics is forcing me into VSCode/Bash/Git territory more and more. I can recommend anyone wanting to do data analysis to start with the Jupyter/Python/Pandas/Seaborn combo, the notebook just makes it very easy to write small code snippets at a time, test them and move on. Writing markdown instructions and introductions/conclusions in the document itself help you to make highly readable reports that make it easy to reproduce what you did years ago.

1 comments

Would you know, or can recommend, any good datasets (or practice exercises) using "Jupyter/Python/NumPy/Pandas/Seaborn" for someone with a similar Excel background (and basic understanding of Jupyter/Python/Pandas)?
Seaborn has a standard data set (now that I searched it, it is part of scikit I think) [0], however, I think what made learning fast is that I used the same type of data as I did before and had a clear goal. Excel sheets are easily loaded into pandas:

    import pandas as pd
    file = pd.read_xlsx('some_excel_file.xlsx')
    file # Just typing this will display the file as a table in jupyter, after ctrl-enter to execute the code block
To plot:

    import seaborn as sns
    %matplotlib inline # This makes the plot appear in the notebook instead of in a separate window
    sns.violinplot(file)
Boom, that is it (assuming the Excel file is a number of columns with labels as the top row).

[0] http://scikit-learn.org/stable/auto_examples/datasets/plot_i...

Such resources are nice, certainly, they give a feel for what can be done. But in my experience you learn when you get your data loaded and start putting together code based on stackoverflow (or other) answers. Not by "dry-reading" someone else's work. There is no moment where you say: "I'll learn X now", there is a moment when X is the best solution to your problem en you start using it... and become an expert before you realize it. Imho.

Maybe it's different for you of course. And, I may have been in a nice position where I had a job that started to required X at some point. I realize that. But then maybe you can find a problem of your own (maybe you want to plot the data from your fitness tracker?) I once spend a lot of time plotting the details of my mortgage (cumulative paid, rent, decreasing dept as function of monthly payments), such data is just the result of some input and you make a table out of it yourself (in Excel if you want, in Pandas if you feel comfortable enough).