One of the first questions you will get if you say you are interested in data science is this: “What is data science, how is it different from statistics”, or: “What is data science, how is it different from computer science?” Some people will begin a discussion about what sort of stuff is or isn’t data; this is somehow similar to discussing whether chemicals are natural or not - usually not very interesting.
So, if you are going to get out there it’s nice to have an answer prepared.
Let me start with drawing a parallel to another, rather new academic discipline: Computer Science. At University of Copenhagen, the Department of Computer Science was created in 1970 as an offshoot of the Institute for Mathematical Sciences. The first professor at DIKU was Peter Naur., who was astronomer. Surely there were jobs in “computer science” in Denmark before 1970, and those pre-computer scientist might well have got the question: “What is computer science, how is it different from mathematics?”…
“Data scientist is the sexiest job title in the 21th century” says Thomas H. Davenport and D.J. Patil in 2012, and some two years later, Thomas H. Davenport writes in his article: “Its already time to kill the data scientist title” that:
Shortly after the article came out, a woman introduced herself to me at a health care analytics conference. Her business card said “Data Scientist,” but it was clear that she was a quantitative analyst at best. “Who can resist having the sexiest job of the century?” she asked.
That sort of abuse is clearly a problem with inventing a new title that hasn’t yet been associated with any official degree, but I find that it has more to do with the immaturity and novelty of the title than the opposite – like everything new, the data scientist title needs to find its shape and place.
I have collected a sample of definitions of data science or data scientist, the first is from Gartner blogs:
I heard a couple of definitions: a data scientist is 1) a data analyst in California or 2) a statistician under 35. Either make 10% above the salary of common data analysts and statisticians [...].
Am I the only one thinking of the dot com era here?
The go-to source for definitions, Wikipedia: offers the following definition:
Data science is, in general terms, the extraction of knowledge from data. The key word in this job title is "science," with the main goals being to extract meaning from data and to produce data products.
The Wikipedia article then continues with a list of techniques and theories from fields within the areas of mathematics, statistics and computer science employed in data science. This seems to land on data science being an independent discipline with roots in statistics , mathematics and computer science.
The Venn diagram below pops up many places (image source O'Reilly Labs). It adds unspecified domain knowledge (or substantive expertise) to statistics and computer science. This actually implies a whole range of data science disciplines, i.e., one per domain field.
Having a range of data science disciplines is very much in line with the definition(s) given by New York University.
Here is yet another, colourful description of a data scientist. It origins from CMS Wire:
Though it seems more like a description of the ideal job candidate, it does suggest that on top of statistics and computer science, the curriculum of data science should contain some business and communication topics.
In conclusion, everyone seems to agree that data science consists of some statistics, some computer science and something else – a third ingredient.
In my opinion it is natural to use the term “computational X” if the third ingredient is a certain domain X (e.g., computational health care, computational law or computational economics). Data science should be reserved for the emerging field with roots in computer science and statistics (and mathematics), that is concerned with theory and practice of probability models, machine learning, big data (and many more to do with extracting knowledge from data) and have some general business knowledge and communication in the curriculum. My definition of data science would look like this:
What is your definition?