Insights from Doug Gray: Why Data Science Projects Fail and the Question I Wanted to Ask
Ever since I heard Doug Gray at INFORMS talk about the reasons data science projects fail, I was hooked. He is a great speaker, has a great book, Why Data Science Projects Fail, and draws on his experience at Walmart, Southwest Airlines, American Airlines, and others.
Here’s the background for the question I wanted to ask. I’ve heard many versions of this statistic: “80% of data science projects fail.” I never quite trusted the context behind why that number was being presented: Was it intended to suggest a way to lower it? Is it even real? What is the threshold for failure? Does it deter people from pursuing low-risk projects?
Here’s the question I asked: What should the failure rate be? If data science is more like venture capital, maybe 80% is OK. A failure rate of 0% seems like you aren’t taking any risk.
He gave a three-part answer:
In his opinion, the failure rate for operational data science projects should be 10% and 50% for R&D projects.
He then gave some stats behind the 80% failure rate. He said that if you break it down, approximately 75% are classified as immature, and their failure rate is 90%. The failure rate for mature organizations is closer to 50%. And, the weighted average is 80%. This gives some insight into where projects are likely to fail.
Then, he said that many projects probably wouldn’t start if more people understood the lessons from failed projects. This reminded me of a previous conversation with Irv Lustig on risk factors outlined in the Princeton Twenty.
The interview provided many more valuable insights. Here are two more:
One, we learn more from our successes and the failures of others.
This was a good reason we needed a book like this.
This one made me think about my capstone class—how do I ensure successful projects so that students maximize learning (and make the clients happy!) while highlighting lessons from failed projects?
Two, don’t underestimate testing.
He quotes the book The Mythical Man-Month and says that a project should be 1/6 design, 2/6 coding (and model building), and 1/2 testing. It takes a lot of effort to go from a model to something in production.
Have a listen on Spotify, Apple, or YouTube.