By Mike Macedonia
With increasing frequency, we’re hearing many of our clients’ senior executives say that they need to “get some big data, machine learning, and artificial intelligence” into their enterprises. Their desire is understandable, given ubiquitous reports about the brave new world of autonomous vehicles, cyber warfare, and even AlphaGo, to name just a few dazzling uses of where we’re headed with technology. Some software vendors are even marketing machine learning and artificial intelligence as plug-and-play solutions. They’re purveying the idea that plug-and-play allows enterprises to emerge from the analytical Dark Ages, which depend on highly skilled data scientists, into a more economical world that cuts out the expert middleman. Unfortunately, this approach is not going to yield the return on investment that seems so tantalizing at the moment.
Admittedly, I am moved to write on this topic by a line in an article I recently read: “…with automated machine learning, anyone in the federal government can operate as a data scientist, leveraging the predictive models and insights data can provide.” That’s a big statement, and one I’ll dispute.
When I began my career in the data science field almost 20 years ago, the term “data science” did not exist. Back then, resource managers worked with dawning realization that their growing stash of operational data, if analyzed properly, was the key to better decision making. Thus I became acquainted with the field of Operations Research. In my newly minted job as an Operations Research Analyst, I combined knowledge about how a business worked with its (often poor quality) data and applied what was then referred to as computational science to deliver analytical products. Yesterday’s analyst (today’s Data Scientist) selected the best computational methods for the data and business problem and then delivered decision support products that informed the decision-making customer. The analysts were the practitioners and the computational methods were the tools of their trade.
Today, computational methods have advanced and become more accessible, but the analytics value proposition has not changed. Highly skilled individuals are still needed for reasons a software tool can’t (yet at least) replace: understanding a business’s complexities, including enterprise data architectures; selecting the right tools for the most accurate, insightful analysis; and developing high-impact analytic products, tailored to the problem being solved. Although marketing and hype about plug-and-play AI provides dazzling demonstrations for senior executives to sit through, the tools are not an end in themselves––the proper goal is a complete analytical package underpinned by the data scientist’s expertise.
While plug-and-play machine learning and AI tools can undeniably increase a skilled analyst’s efficiency, they remain the enablers of analytic value, not the source of it. For that, we still need data scientists to bring a range of abilities to the task, chief among them the ability to wrangle messy data. Data wrangling is hard, time-consuming work, the essential starting point of any credible analysis. And as with most things in life, cutting corners doesn’t pay in the end. According to Gartner, trying to get by with poor quality data costs the average organization $9.7 million per year.
In addition, the automated features of plug-and-play tools can be outright dangerous in the hands of unskilled users, who can all too easily draw inappropriate analytical conclusions from enterprise data. A common example is spurious correlation, in which a causal relationship between variables is wrongly inferred. Such lack of analytical expertise leads to counter-productive decision making, just as surely as two and two make four.
Ironically, plug-and-play software makers may be trying to capitalize on skipping the expert analyst because data scientists are in high demand and short supply. The educational pipeline is struggling to keep up, given that data scientist is the top job opening among online job sites, and demand shows no sign of letting up (https://www.infoworld.com/article/3190008/big-data/3-reasons-why-data-scientist-remains-the-top-job-in-america.html). If waving off the data scientist is really the way to go, why were over 13,700 positions open last year?
I say it’s best to ramp up educational opportunities and keep the expertise flowing.