Role of Statistics in Data Science
Statistics and Data science
Each of us has been wondering for long about the crucial line drawn by statistics in the field of Data Science. There have been a lot of researches about the same and it has always been an ongoing discussion. Let us grasp a gist of those discussions and acknowledge ourselves about the role and responsibilities of statistics in Data Science.
Statistics has been one of the prime disciplines to impart methods and tools to provide a certain structure and to furnish us with deeper insight into data, and the principal discipline to quantify and analyze uncertainty. Data Science is a scientific discipline and is majorly influenced by informatics, computer science, operations research, mathematics and statistics as well as the area of applied sciences.
In 1996, for the very first time, the term Data Science was incorporated in the title of a statistical conference (International Federation of Classification Societies (IFCS) “Data Science, classification, and related methods”). Back in the era of 1970s, the ideas forwarded to the society by John Tukey differed from the point of view of statistics to a purely mathematical genre, e.g., deriving hypotheses from data to testing data .
Data Mining which is another important field in regard to data science combines together various approaches to knowledge discovery which includes inductive learning, (Bayesian) statistics, query optimization, expert systems, information theory, and fuzzy sets.
In today/s date, these ideas are in a mélange with the idea of Data Science, leading to different definitions. One of the most extensive definitions of Data Science was recently provided by Cao as the formula goes like :
Data science = (informatics + computing + statistics + communication + management + sociology) | (thinking + data + environment).
In this formula, sociology represents the social aspects and (thinking + data + environment)refers to all the so-called data-to-knowledge-to-wisdom thinking, the mentioned sciences act on the basis of data and the environment. A recent, comprehensive overview of Data Science provided by Donoho in 2015 focuses on the evolution of Data Science from statistics. Indeed, as early as 1997, there was an even more radical view suggesting renaming statistics to Data Science. And in 2015, a number of ASA leaders released a statement about the role of statistics in Data Science, saying that “statistics and machine learning play a central role in data science.”
The statement says statistics is foundational to data science—along with database management and distributed and parallel systems—and its use in this emerging field empowers researchers to extract knowledge and obtain fruitful results from big Data and other analytics related projects. The statement also encourages maximum and multifaceted collaboration between statisticians and data scientists to maximize the full potential of data science.
Combining mathematical research methods and computational algorithms with statistical reasoning, will lead to scientific results based on ideal approaches. So, the ultimate overview is that only a balanced interplay of all sciences indulged together will lead to successful solutions in Data Sciences while statistics playing a major role in it.
Comments