Hacker News new | ask | show | jobs
by Eugeleo 2083 days ago
What textbook(s?) would you recommend for a thorough self-learning of statistics? I’m looking for both intuition _and_ mathematical rigor — not all proofs, but not all fluff either.

I’m a bioinformatics student and I will have a semester of combined probability/stats some time this year, but I think that won’t be enough to support me given my preference for DS-based bioinformatics jobs.

I’m reading Feller right now for the probability stuff, but I’m unsure about statistics. I don’t even know what the relation between probability and statistics is — most similar questions I found online (i.e. “How to learn stats?”) are answered with a “Read this probability book and you’re good”.

7 comments

Rather than a textbook, I've had success getting a copy of the course notes directly from the stats department. The best textbooks I've read where history of statistics and philosophy of statistics.

> I’m reading Feller right now for the probability stuff, but I’m unsure about statistics.

Probability is the study of mathematical objects, and nobody is totally sure if any of them exist even in the approximate. Is anything in the universe random? The question is open, and likely to eternally remain so. Lots of things look similar to a random variable if viewed from the right perspective, but most of them aren't actually random. Not really a problem for the mathematicians, they feel no special need to study things that exist.

Statistics is roughly the study of how to deal with actual results. If you do a census, those results exist. Statisticians then need to make decisions about how to think about their results, and usually fall back on models rooted in probability. Technically speaking, "a statistic" is "any quantity computed from values in a sample". [0]

Basically, statistics is probability + data.

[0] https://en.wikipedia.org/wiki/Statistic

Can you explain this sentence a bit more: "The best textbooks I've read where history of statistics and philosophy of statistics." ?

Are these names of actual books (Google doesn't help) or merely the themes of the stats textbooks you benefited from the most?

Thank you.

Not the above poster, but I concur, and recommend Jaynes' "Probability: the logic of science" for the philosophy and history, and "Breakthroughs in statistics" volumes 1 and 2 for the history as told through original foundational papers, from the 1700s on.
Probability Theory is a branch of mathematics. Statistics is the art of processing data to extract information suitable for the human cognitive system or a computer algorithm. Statistics use mathematical tools like physics or chemistry do.
Not a text, but I highly recommend the Bland and Altman Statistics Notes in the BMJ. They are usually 1 page, easy to read explanation on a single statistic topic.

Here is one on the Odds Ratio for example https://www.bmj.com/content/bmj/320/7247/1468.1.full.pdf

> I don’t even know what the relation between probability and statistics is

That's a great question, and I think the lines are more than a little blurry.

My attempt at an answer would be:

Probability: Given a set of dice and coins and an order for rolling and throwing them, what is the chance of a specific outcome?

Statistics: Given a set of outcomes, what dice where rolled?

So if you want to know if smoking kills, you tally up medical history, and use statistics to see if there is a relationship between smoking and dying.

If you want to know the probability of smoking killing you, you look at the risc each cigarette brings to the table and tally it up using probability theory.

More elegantly phrased examples can be found on Stack Overflow: https://stats.stackexchange.com/questions/665/whats-the-diff...

I kind of like M.G. Bulmer's "Principles of Statistics". It's short and to the point so there's a chance of getting through it all. I really like the discussion of distributions in terms of raw data, it makes thinking about mean, variance, higher moments etc., much easier. It also doesn't skimp on the mathematical theory, but it doesn't allow itself to get bogged down by it.

That said, there's a chance I just read it late enough in my career to be more ready for its content.

I cannot recommended them because I have them on my back burner, but I would like to start working through the "Think" Stats series soon [1].

I love the premise: "if you know how to program, you can use that skill to learn other topics."

Perhaps someone here can speak to their experience with some of these books?

1: http://www.allendowney.com/wp/books/

I am doing harvard 110 now (free lectures on youtube) and I am working through his book Introduction in probability.

Maybe it is too basic for you, but It is focused on the intuition part and I can recommend it!

Probability & statistics by de Groot is the standard text I believe. Full of examples and questions.