Hacker News new | ask | show | jobs
by roddylindsay 1150 days ago
A high school "data science" course, if designed properly, will be far more useful to students and beneficial to society than calculus.

Every high school student should learn how to grapple with uncertainty, how to evaluate statistical claims and experiments, how to interpret graphs and charts, understand how machine learning models work (at a high level), and internalize concepts like "significance", "error bars", and "expected value."

This training will help all students every single day of their lives, because it teaches them how to think. Society benefits from having more people with the tools to evaluate data and deal with uncertainty, especially as we face a looming epistemological crisis.

Calculus, on the other hand, will be used by very few students, and even for those few, it will not likely be used every day. Yes, it is a prerequisite for some STEM courses as part of a degree program, and so calculus can be taught to undergraduates pursuing a STEM field in their first year (or those who take it as an elective in high school.)

It's a shame that Stanford and Harvard, which set the tone for high schools and high schoolers, are going the wrong direction here.

4 comments

> Every high school student should learn how to grapple with uncertainty, how to evaluate statistical claims and experiments, how to interpret graphs and charts, understand how machine learning models work (at a high level), and internalize concepts like "significance", "error bars", and "expected value."

Pet peeve: can we just go back to calling these things statistics?

While I agree with you that statistics should be more heavily emphasized at the high school level, the issue goes much deeper within American math education that the one class.

I would assume that a data science class is mostly "good old statistics." But if "data science" is the phrase that gets education boards to put more student butts in seats in stats class, I'm all for it.
Wouldn’t a data science curriculum be more multi-disciplinary than a ‘statistics’ course?

Visualization, scripting, data collection, models, simulation. EDx had a great course by Guttag and Grimson. Add to this Scott E Page’s Model Thinking. Add EDx Data Analytics and Learning From UT Arlington. And some Tufte.

I say these because i work in the accounting field and brought scripting to my firm from my own self-study. It’s been a super power for me, and solved several problems which my colleagues had tackled using Excel alone.

I’ve also studied statistics, but found it less generally useful.

>Wouldn’t a data science curriculum be more multi-disciplinary than a ‘statistics’ course?

I would say yes, however, the items listed in the comment I quoted fall squarely within the realm of statistics. I don’t have a problem with calling a curriculum of statistics + data manipulation tools “data science” but that’s not what’s realistically being covered in these high school programs.

> concepts like "significance", "error bars", and "expected value."

Yes. I see what you're responding to--these are squarely in the statistics domain.

> not what’s realistically being covered in these high school programs.

Yes. Where the rubber meets the road. Who exiting from higher education now will have the skills to teach this imagined hybrid course? Realistically, they have to be vetted and hired by the mathematics department and satisfy some state and/or federal standards of education, which are currently staffed by educators who themselves are following standards of their office.

I was responding to the OP's premise:

> "data science" course, if designed properly, will be far more useful to students and beneficial to society than calculus.

Whether or not that objective is "realistic" given the current boundaries perscribed for high school education is another matter.

There is hope; there are modern thinkers in education out there. I referenced the UT Arlington course students and instructors referred to as DALMOOC (google it). I took this course thinking it was another data science course, and found a course taught by teachers for teachers. I hung in because their ideas were so fresh and interesting.

DALMOOC's ambition was to train teachers to encourage students to use social media to communicate their learning results, and in turn produce the data that the teachers were being traind in the course to analyze using social media analysis techniques. DALMOOC professors encouraged participants to generate social media responses to DALMOOC coursework. Very modern. Not sure how long before professors like George Siemens, whose brainchild DALMOOC was, get into state and federal positions of authority and influence to see their modern ideas at the high school level.

https://en.wikipedia.org/wiki/George_Siemens

> A high school "data science" course, if designed properly, will be far more useful to students and beneficial to society than calculus.

How do you expect students to understand what they are doing with "data science" without learning probability and statistics, and how do you expect students to get probability and statistics without learning calculus?

I mean, Bayes' theorem. How do you get people to get it if they don't know calculus?

I don't recall Bayes' theorem involving calculus. Are you sure you aren't thinking of some other theorem?

Bayes' theorem follows straightforwardly from P(A & B) = P(A|B) P(B) and P(A & B) = P(B & A). The latter tells us that we can swap A and B in the former without changing the value, giving us P(A|B) P(B) = P(B|A) P(A).

Rearranging gives P(A|B) = P(B|A) P(A) / P(B), which is Bayes' theorem.

You can sidestep calculus by just using the discrete setting rather than a continuous one.

If you want to introduce continuous distributions like the Gaussian one, you can just say "area under the curve" if you need to connect the density to a numerical probability. They don't have to know how to do the integral, in the case of a Gaussian, it's just tabulated anyway.

I'd argue that you could teach a perfectly reasonable high school stats class using this kind of approach.

A "calculus-free" method is mostly what is done for high school physics, with occasional nods in that direction to set the students up later. And like physics, the obvious connection to of continuous probability to calculus will be a nice motivation later on.

One analogy is how we teach probability to sophisticated engineering undergraduates. I'm not aware of undergrad engineering curricula that use measure theory. This results in awkwardness around delta "functions" and probabilities of certain sets of measure zero (sets that cannot be integrated without the Lebesgue integral).

And sure, some of those undergrads don't ever take that measure theory class, so they escape to the wild without knowing the answers to awkward questions.

> If you want to introduce continuous distributions like the Gaussian one, you can just say "area under the curve" if you need to connect the density to a numerical probability.

What name do you give to this "area under the curve", or the "rate of change" of this area? They are pretty fundamental concepts with important and basic properties, which affect things like local optima and minimization, and expected value and covariance, etc. I mean, you can't cover linear models and least squares without this stuff, and if you don't then I wouldn't really call it learning.

You call “area under the curve”… area under the curve. Expected values, least squares, linear model, etc can all be explained in the discrete case without calculus.

High school math isn’t and doesn’t need to be rigorously proofed based, if you lack some do the tooling necessary to demonstrate a proof, you can tell a student, “the proof requires calculus” and boom, you’ve given them a reason to take an interest in the subject.

You don’t need integration to define expected value or covariance in the discrete case. TBH I’m not sure if you can get around integration in the general continuous case or not.

If not, you could use some limiting argument to handle the moments of a continuous uniform RV, at least, in terms of the discrete analog.

You don’t need calculus to derive least squares estimators. You can follow the logic in this quora answer [1] to show that (e.g.) the mean is the minimum MSE estimator among constant functions, and that the conditional mean is the minimum MSE estimator among “general” (measurable L2) functions.

This derivation is familiar to many who have studied these concepts. It’s clever, it does not need differentiation, just expectation and logic.

It could be that your studies in probability were done using a certain pedagogical path, and that’s blinding you to the fact that other paths are possible.

[1] https://www.quora.com/Why-is-minimum-mean-square-error-estim...

High schools often teach physics and without calculus as a prerequisite. It definitely makes it more challenging, but you can still communicate the concepts at a different level of detail.
> High schools often teach physics and without calculus as a prerequisite.

Does it though? For example, you simply cannot teach Newton's laws of motion without knowing what a derivative is.

> For example, you simply cannot teach Newton's laws of motion without knowing what a derivative is.

You absolutely can do that. You might now want to, but you can, and people do.

It does, my kid finished calculus-free high school physics last year.
You can definitely explain Bayes' theorem without calculus. I just asked ChatGPT to do it and it came up with a great example using a deck of cards and some fraction math.
>understand how machine learning models work (at a high level)

>Calculus, on the other hand, will be used by very few students,

These two statements do not mesh. Understanding how machine learning models work requires Calculus.

If the goal is to teach them basic statistics to be useful and not to do science with it, then just make them watch a few YouTube videos on the topic as part of their 9th grade math class?