Separating Hype From Value In Artificial Intelligence

Daniel Morales
Jul 09, 2021


You've probably heard a lot about data science, artificial intelligence and big data. Frankly, there has been a lot of hype around these areas. What it has done is inflate expectations about what data science and data can actually accomplish. Overall, this has been negative for the field of data science and for big data. It is useful to think a bit about the questions that can be asked to separate the hype of data science from the reality of data science.

The first question is always "What is the question you're trying to answer with the data?" If someone comes to talk to you about a big data project, an artificial intelligence project, or a data science project, and they start talking about the newest technology they can use to do distributed computing, and analyze data with machine learning, and they throw a bunch of buzzwords at you, the first question you should ask is "What is the question you're trying to answer with the data?" Because that really narrows down the question and filters out a lot of hype around the tools and technologies that people are using, which can often be very interesting and fun to talk about. We like to talk about them too, but they're not really going to add value to your organization on their own.

Also Read: Data Democratization and AI in the Financial Sector

The second question to ask yourself, once you've identified the question you're trying to answer with the data, is, "Do you have the data to actually answer that question?" So often the question you want to answer and the data you have to answer with are not really very compatible with each other. So you have to ask yourself, "Can we get the data in such a way that we can answer the question we want to answer?" Sometimes the answer is simply no, in which case you have to give up (for now). Bottom line, if you want to decide whether a project is hype or reality, you have to decide whether the data people are trying to use is actually relevant to the question they are trying to answer.

The third thing to ask yourself is, "If you could answer the question with the data you have, could you even use the answer in a meaningful way?" This question goes back to that idea from the Netflix competitions where there was a solution to the problem of predicting what videos people would like to watch. And it was a very, very good solution, but it wasn't a solution that could be implemented with the computing resources that Netflix had in a way that was financially expedient. Even though they could answer the question, even though they had the right data, even though they were answering a specific question, they couldn't actually implement the results of what they found out

If you ask yourself these three questions, you will be able to decipher very quickly whether a data science project is all hype or whether it is a real contribution that can actually move your organization forward.

How do you determine the success of a data science project?


Small businesses rarely use cutting-edge technology, simply because it is not within their budgets, expertise or resources. However, almost all are called upon to experiment with such technology, because if they don't, someone else will, and ultimately whoever does will gain in competitiveness, cost or profitability.

Defining the success of an AI project (which is technically called data science or machine learning) is a crucial part of managing a data science experiment. 

Of course, success is often context-specific. However, some aspects of success are general enough to merit discussion. My list of hallmarks of success includes

Also Read: What Are the Expected Results of a Data Science Project?

The creation of new knowledge.
Decisions or policies are made based on the outcome of the experiment.
A report, presentation or app with impact is created.
You learn that the data cannot answer the question you are asking.

Some more negative outcomes are: that decisions are made that ignore clear evidence from the data, that the results are equivocal and do not shed light in one direction or another, that uncertainty prevents the creation of new knowledge.

Let's talk first about some of the positive outcomes.

New knowledge seems ideal to me. However, new knowledge does not necessarily mean that it is important. If it produces decisions or policies

If it produces actionable decisions or policies, even better. (Wouldn't it be great if there were evidence-based policy, like the evidence-based medicine movement that has transformed medicine?). Having our data science products have a big (positive) impact is, of course, the ideal. Creating reusable code or applications is a great way to increase the impact of a project.

Finally, the last point is perhaps the most controversial.

I consider a data science project to be successful if we can demonstrate that the data cannot answer the questions being asked. I remember a friend telling a story about the company he worked for. They hired many expensive data science consultants to help use their data to inform pricing. However, the prediction results were not helping. 

They could see that the data could not answer the hypothesis being studied. There was too much noise and the measurements were not accurately measuring what was needed. Sure, the result was not optimal, as they still needed to know how to price things, but it did save money on consultants. Since then, I have heard this story repeated almost identically by friends in different industries.

Also Read:
* How the Biggest Companies in the World Design Machine Learning Applications
* What Is Open Innovation In Data Science?

“Separating Hype From Value In Artificial Intelligence”
– Daniel Morales twitter social icon Tweet

Share this article:

0 Comments

Post a comment
Log In to Comment

Related Stories

Jul 23, 2021

Pandas vs SQL. When Data Scientists Should Use One Over the Other

A deep dive into the benefits of each toolTable of ContentsIntroductionPandasSQLSummaryReferencesIntroductionBoth of these tools are important to n...

Matt Przybyla
By Matt Przybyla
Jul 14, 2021

How To Write The Perfect Data Science CV

These tips are also applicable to Software Engineers. Make a few changes in your CV and land that job!Writing a good CV can be one of the toughest ...

Roman Orac
By Roman Orac
Jul 05, 2021

Data Scientists Without Data Engineering Skills Will Face the Harsh Truth

OPINION.You have probably read an article about the difference between a data scientist and a data engineer. I always thought the distinction was c...

Soner Yıldırım
By Soner Yıldırım

Win USD $2,000 in cash prizes with our data science competition!

🎉 Model submissions for the "Keyword Recency Prediction" competition will close in

arrow-up icon