by Gerhard Pilcher
Data Scientist is an unusual term. If you “Google” the words, data means “facts and statistics gathered together for reference or analysis” and scientist means “a person who is studying or has expert knowledge of one or more of the natural or physical sciences.”
Dr. Michael Rappa, founder of the Institute of Advanced Analytics at NC State University, prefers the term Analytic Professional, which I think better captures the role of a person that experiments with the application of analytic algorithms on data to enhance the mission or business value of information.
Dr. John Elder, founder of the oldest data science consultancy, says that the tension set up by the definition of data and scientist not fitting together “forces one to confront the new idea that data expertise is complex and valuable enough to be its own science.” So while we may disagree about the role title “data scientist”, we do agree that the application of scientific methods to the analysis of data is extremely valuable.
The Origin of Data Scientists
The term “data science” first appeared in Peter Naur’s book Concise Survey of Computer Methods (1974). According to the Harvard Business Review1, the term was re-coined in 2008 by D.J. Patil or Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook. That article defines data scientists as “a high-ranking professional with the training and curiosity to make discoveries in the world of big data.” That definition applies to a narrow set of people, but today the label can seem to encompass almost anyone who explores or analyzes data
Data Science only has value if the new information created by the process is used by a human being to enhance the probability of making a good decision. The prior statement covers a lot of ground.
These steps require more than analytical skill. To bring a solution from idea to implementation usually requires a team with a shared goal, diverse skills and perspectives, and a healthy dose of curiosity. The sometimes-arduous process is not formulaic, but requires depth of insight, creativity, and flexibility.
The skill sets and experience required to perform each of the listed steps are quite varied. They include: business or mission problem solver and/or system engineer, data or software engineer, analytic professional, user experience designer, change management professional, or communications professional.
Where are we headed?
What may have started as a statistician solving analytic problems has become a much more important role that is at the intersection of business/mission, computer science, engineering, and statistics. It is rare indeed for a single person to command mastery of all of these disciplines to the degree required.
I think the business world is maturing in its understanding of the value of data science. Some are seeing beyond the technical hype and realizing that the analytic part of the process is often straightforward compared to the vital surrounding steps of defining the opportunity, collecting data, and successfully delivering the results to a decision maker or automated system.
“…the analytic part of the process is often straightforward compared to the vital surrounding steps…”
Gerhard Pilcher | CEO | Elder Research
A Deeper Look
While the world has begun to realize that a holistic approach requires a multidisciplinary team, it’s been slower to recognize the value of deep experience and advanced quantitative training/degrees in understanding the limitations and assumptions of complex mathematical techniques. This can lead to flawed application of the techniques and thus to misleading results. Some in the software industry have compounded this by promising to create “data scientists” in mere weeks. I liken this to creating “citizen heart surgeons” by using robotic surgery devices. I would rather have a true surgeon at the controls!
As is normal in a healthy growing field, data science is also getting more specialized. We now have experts in individual disciplines of neural networks, graph analysis, natural language processing, image processing, algorithmic compute efficiency, non-linear optimization, simulation, survival analysis, interpretable machine learning, etc. This luxury makes the generalists, who can figure out what specialists to call in for a specific problem, valuable gatekeepers, “triagers,” and interpreters.
Data is at the heart of each new discovery in every field of science. Though scientists may be expert in their field they are rarely fluent in data analysis. So data scientists, who are such experts, are useful additions to every possible team. Being well-versed also in every known way that we humans can be led astray by data, data scientists can save many months of project pain and effort by “heading trouble off at the pass” and establishing research and project experimental protocols on a sound foundation from the beginning. They are the best antidote to the current “Crisis in Science” where most work — even published journal articles — cannot be replicated! In the working world, replication-quality work translates into models that reliably predict future events in the real world, and not just on paper in the lab.
Finally, using rational models on carefully curated data is a great way to build ethical models that support decisions free of bias for the betterment of society. In this way, the role of data scientists is steadily changing — to provide careful data curation and problem definition and critical validation of results — to ensure that the value of data continually increases.
Originally published at https://www.elderresearch.com on March 18, 2021.