Hacker News new | ask | show | jobs
by jmde 3592 days ago
This seems like a nice compilation for introductory material in one place.

I still can't get over the term "data science", though. Not only is it ridiculously meaningless - what sort of science doesn't involve data, and how often would data be useful to something that isn't scientific at some level - its meaninglessness derives from the hyped buzzword trendiness that drove its upswing.

I say this as someone whose expertise is really sitting at the nexus of what would be considered data science. I feel as if I have been doing what might be considered data science for a long time, before there was a label for it, but watching its ascendance in demand and popularity has been troubling. I should be happy, but I feel like it's being driven by fashion rather than fundamentals, which makes me worried about the trajectory going forward, and disturbed by some communities being thrown under the bus.

10 comments

> I still can't get over the term "data science", though. Not only is it ridiculously meaningless - what sort of science doesn't involve data, and how often would data be useful to something that isn't scientific at some level

All (empirical) science involves data, but not all of the work of science is the domain-neutral skill of analyzing data. I think "data science" is a bit of a misnomer -- or at least, uses an older and less specific definition of "science" than is now typical -- ("Data in science" would be more accurate under the narrower definition of science, and "Data analytics" probably more direct and clear), but its not *that bad (its no worse than, e.g., "computer science".)

>I still can't get over the term "data science", though. Not only is it ridiculously meaningless - what sort of science doesn't involve data, and how often would data be useful to something that isn't scientific at some level - its meaninglessness derives from the hyped buzzword trendiness that drove its upswing.

I couldn't disagree more.

There are a number of terms for domain-independent data analysis:

- data analysis

- statistics

- statistical modeling

- machine learning

- big data

- data journalism

- data science

I think it makes perfect sense that the practice of collecting and analyzing data be qualified and indentified as a specific field.

I know of no better resource than these venn diagrams which identify the 'danger zones' around data science:

- http://datascienceassn.org/content/fourth-bubble-data-scienc...

Is there such a thing as a statistical model which only applies to a certain domain?

Domain knowledge ("substantive expertise"/"social sciences" in the linked venn diagrams) serves only to logically validate statistical models which may be statistically valid but otherwise illogical, in context to currently-available field knowledge (bias).

Regardless of field, the math is the same.

Regardless of field, the model either fits or it doesn't.

Regardless of field, the controls were either sufficient or they weren't.

As a non-data-science practitioner, I think the term works. "Data science", from what I gather, focuses on the work of collecting, collating, maintaining, and analyzing data. All science may rely on data but not all scientists work with data well.

In contrast, I think the term "data journalism" is poor. Because it isn't (typically) about the journalism of data, e.g. what's going on with the use of data. And so to talk about data journalism being a field (nevermind a niche field) makes it seem as if other kinds of journalism don't use data. Even the reporters who rely on 3 anecdotes/interviews to make a story are using data, they're just using a very poor form of it (what with data being the plural of anecdote).

I think David Leonhardt, the former editor of the NYT Upshot, said it well:

http://www.nytimes.com/2015/06/20/upshot/death-to-data-journ...

> Data journalism, ultimately, has the same aim as “quote journalism” and “anecdote journalism.” They all aspire to be “fact journalism” or, more eloquently, journalism.

>I still can't get over the term "data science", though. Not only is it ridiculously meaningless - what sort of science doesn't involve data, and how often would data be useful to something that isn't scientific at some level - its meaninglessness derives from the hyped buzzword trendiness that drove its upswing.

Out of curiosity, how do you feel about the word 'computer science'?

That's an interesting question - I agree it's an interesting parallel and one I hadn't thought of before.

I have always been puzzled by the term "computer science" a bit also, because so much of it isn't really science per se (more math or theory along with engineering). When I've thought about it, I usually come to some peace with it because there is a scientific aspect to the field via the hardware side of things, which is really the foundation, at least historically, and there is a historical emphasis on demonstrating results empirically. It's sort of a crude awkward label but I accept it. But then again I went to a school where/when comp sci and EE were the same department.

"Data science" has bothered me more, though, because it's so vague, "data" and "science" are so inextricably defined relative to one another, and because it's arguably misleading - it's not really the science of data, whatever that means, and to the extent it's science, it's just science, but it's not, it's really just statistics.

More appropriate terms to me would be "computational statistics" or "statistical computing", "informatics", or "quantitative computation" or something. Anything but "data science." It's like some stereotypically ignorant but buzzword-compliant management committee, being unfamiliar with data or science, somewhere commanded HR to "find us some of those... you know... data science people!"

... and now venerable universities have whole departments with that title.

> it's not really the science of data

How isn't it?

> what sort of science doesn't involve data

There are many like theoretical computer science which do not involve data.

> I say this as someone whose expertise is really sitting at the nexus of what would be considered data science. I feel as if I have been doing what might be considered data science for a long time, before there was a label for it, but watching its ascendance in demand and popularity has been troubling. I should be happy, but I feel like it's being driven by fashion rather than fundamentals, which makes me worried about the trajectory going forward, and disturbed by some communities being thrown under the bus.

There will be a time where things will consolidate. During this time, people who really do data science will be stuck with while people who just have it as a title for the sake of it would face problems.

>> what sort of science doesn't involve data

>There are many like theoretical computer science which do not involve data.

That computer science is a 'science' is also pretty contentious! :)

Much of computer science is a science. The contention seems to comes from the fact that software engineers tend to come from computer science departments. Maybe more universities should create separate software engineering departments?
I don't feel qualified enough to discuss whether or not CS is truly a science, but I do think there's a strong distinction to be made between it and soft eng, for sure.
> Much of computer science is a science.

AFAICT, its mostly a subdomain of math, not science in the empirical sense.

Reminds of me this - https://xkcd.com/435/
Math? A lot mathematics doesn't deal with data.

Here's on whether CS is a science

https://www.cs.mtu.edu/~john/jenning.pdf

Welcome to the human world! That's just how things work in this world (esp. the industry). We're not logical (or rational) all the time to pick all the "right" words, and while I agree with you on ambiguity of the term, I should also mention that "new" words are indeed required to describe the new things. The way I see it, to name something new, you have three options:

1) Borrow a word from a foreign language; 2) Coin (invent) a new word yourself (e.g. "foo = bar") 3) Give new meaning to old words.

DS is made using the 3rd method. It's vague, it's ambiguous, and it's just not "correct"! But that's exactly why people will "remember" it, as a puzzle and an anomaly. That's how the word sticks in the mind.

I can see both sides. DS is an awful lot like good-old-fashioned statistics, especially in describing the shape, patterns, and significance of events. But the rise of vast amounts of raw data of diverse kinds and origins, especially deeply contextual data like english text -- this is new, and I think it warrants a more meaningful label for the formal study and the practice of such analysis.

I also have no problem with the use of "science", since DS is one of the purest applications of the scientific method I know. You observe, you hypothesize, you explore and test, you use statistics to draw or reject a conclusion. Of course, it's almost impossible to eliminate all the confounding factors, but that's part of the fun...

The cornerstone of the scientific method is "hypothesis testing via experiments". Data scientists typically skip the experiments and make models based on pre-existing data. So, I am skeptical of calling it a science.
I still can't get over the term "data science", though. Not only is it ridiculously meaningless - what sort of science doesn't involve data, and how often would data be useful to something that isn't scientific at some level - its meaninglessness derives from the hyped buzzword trendiness that drove its upswing.

I tend to rail against the needless creation of new buzzwords myself, but I can actually see some use for "data science". It is a little vague, but I see it as a slightly more concise way of saying "the confluence of applied statistics, machine learning, and analytics" or something roughly to that effect.

The thing is tho', that what we call "machine learning" nowadays, statisticians were already doing for years. 25-year-old "data scientists" think that before they came along all there was was Excel... And don't realize that now most of what they do can be done in Excel...
> I still can't get over the term "data science", though. Not only is it ridiculously meaningless - what sort of science doesn't involve data

Yes, but data is usually an accessory, a means to an end.

Data science concerns itself specifically with how to better extract meaning from that data.

before there was a label for it

It used to be called data mining, or business intelligence, even plain ol' statistics. People have been doing it since the 80s if not earlier. But this is how the industry works, take an old concept, slap a new buzzword on it, PROFIT!