Hacker News new | ask | show | jobs
by laGrenouille 1414 days ago
These notes might be a great source for what they cover, but as a whole I find this to be a good example of what is currently wrong with data science education. While the syllabus has bullet points that include "1. data collection", "2. data management", and "5. communication", the content and schedule have a 90%+ overlap with a standard machine learning course. They even use a statistical learning textbook (a good one, but still).

Statistics departments keep trying to latch on to the excitement (and money) around data science by changing the superfluous things like department names and course titles without actually adjusting what they teach. I would love to see a version of this that actually engages at a non-superficial level with topics such as database design, theory(ies) of data visualization, methods for storytelling with data, and interactive design.

3 comments

>> would love to see a version of this that actually engages at a non-superficial level with topics such as database design, theory(ies) of data visualization, methods for storytelling with data, and interactive design.

I love these discussions and taxonomies in data science. So I have a few genuine/honest questions:

1) isn't what you said more "analytics" or "analytics engineering" oriented (which also and itself is a subtopic/subfield of data science) ?

2) I think that more and more people are trying to define what "data science" is, specially for marketing purposes, and then put it in a box, like any other science (i.e. chemistry - take an undergrad chemistry textbook and they will always cover the same topics). But since it isn't well defined yet, many different courses covers different algorithms/aspects of data science, so I think it end up looking superficial and hard to please everyone. Would you agree w/ that? For ex. I'm trying to find a good and in depth course that applies Data Science/Machine Learning in Big Data problems, but I just can't find any serious course covering it.

I completely agree that it's an open question about what exactly constitutes data science and what should (or at least could) be covered in a standard introduction. For me, a fairly reasonable—though certainly not definitive—set of topics are five items listed on this course's syllabus. And that's what makes this so frustrating, personally. The instructors actually have a good proposal of what should be taught, but then just turn around and teach a classical course in statistical learning.
the content and schedule have a 90%+ overlap with a standard machine learning course

Note that neural networks are not even mentioned in the content. This is not a good course to learn modern ML.

At the time the comment was made the link was https://harvard-iacs.github.io/2019-CS109A/pages/materials.h... where neural networks were mentioned.

See https://news.ycombinator.com/item?id=32295656

The other topics you mentioned aren’t exactly classified as “data science” so you likely won’t see them in most university data science courses. Database design has its own course usually but I’ve seen more of the rest as part of college/certificate programs.
The data scientists I've worked with definitely do data visualization and storytelling with data. (Schema design, not so much...)
You're thinking too narrowly about what "schema design" could mean. No, data scientists do not typically design back-end, production database systems. But defining and organizing a multi-sheet spreadsheet for manual data collection is what many data scientists spend much of their time doing (i.e., in the biomedical space). Doing that well definitely requires some understanding of concepts such as functional dependency, normal forms, and data types.