You've probably heard a lot about data science, artificial intelligence and big data. Frankly, there has been a lot of hype around these areas. What it has done is inflate expectations about what data science and data can actually accomplish. Overall, this has been negative for the field of data science and for big data. It is useful to think a bit about the questions that can be asked to separate the hype of data science from the reality of data science.
The first question is always "What is the question you're trying to answer with the data?" If someone comes to talk to you about a big data project, an artificial intelligence project, or a data science project, and they start talking about the newest technology they can use to do distributed computing, and analyze data with machine learning, and they throw a bunch of buzzwords at you, the first question you should ask is "What is the question you're trying to answer with the data?" Because that really narrows down the question and filters out a lot of hype around the tools and technologies that people are using, which can often be very interesting and fun to talk about. We like to talk about them too, but they're not really going to add value to your organization on their own.
Also Read: Data Democratization and AI in the Financial Sector
The second question to ask yourself, once you've identified the question you're trying to answer with the data, is, "Do you have the data to actually answer that question?" So often the question you want to answer and the data you have to answer with are not really very compatible with each other. So you have to ask yourself, "Can we get the data in such a way that we can answer the question we want to answer?" Sometimes the answer is simply no, in which case you have to give up (for now). Bottom line, if you want to decide whether a project is hype or reality, you have to decide whether the data people are trying to use is actually relevant to the question they are trying to answer.
The third thing to ask yourself is, "If you could answer the question with the data you have, could you even use the answer in a meaningful way?" This question goes back to that idea from the Netflix competitions where there was a solution to the problem of predicting what videos people would like to watch. And it was a very, very good solution, but it wasn't a solution that could be implemented with the computing resources that Netflix had in a way that was financially expedient. Even though they could answer the question, even though they had the right data, even though they were answering a specific question, they couldn't actually implement the results of what they found out
If you ask yourself these three questions, you will be able to decipher very quickly whether a data science project is all hype or whether it is a real contribution that can actually move your organization forward.
How do you determine the success of a data science project?
Small businesses rarely use cutting-edge technology, simply because it is not within their budgets, expertise or resources. However, almost all are called upon to experiment with such technology, because if they don't, someone else will, and ultimately whoever does will gain in competitiveness, cost or profitability.
Defining the success of an AI project (which is technically called data science or machine learning) is a crucial part of managing a data science experiment.
The creation of new knowledge.
Decisions or policies are made based on the outcome of the experiment.
A report, presentation or app with impact is created.
You learn that the data cannot answer the question you are asking.
Some more negative outcomes are: that decisions are made that ignore clear evidence from the data, that the results are equivocal and do not shed light in one direction or another, that uncertainty prevents the creation of new knowledge.
Let's talk first about some of the positive outcomes.
New knowledge seems ideal to me. However, new knowledge does not necessarily mean that it is important. If it produces decisions or policies
If it produces actionable decisions or policies, even better. (Wouldn't it be great if there were evidence-based policy, like the evidence-based medicine movement that has transformed medicine?). Having our data science products have a big (positive) impact is, of course, the ideal. Creating reusable code or applications is a great way to increase the impact of a project.
Finally, the last point is perhaps the most controversial.
I consider a data science project to be successful if we can demonstrate that the data cannot answer the questions being asked. I remember a friend telling a story about the company he worked for. They hired many expensive data science consultants to help use their data to inform pricing. However, the prediction results were not helping.