-

Weak measurement better than no measurement?

Testing No Comments

Why do humans insist on predicting things that are inherently unpredictable?

We can’t predict the stock market. Why not? We have years and years of past data. Data is the key right? If data allows us to predict the future then it’s simple - build a model of the market, test it on past data, and get rich. Right? Hmmm - something is flawed in that argument. I’m thinking that it might be a little more complex than that. I know that people build their lives around trying to do this - my experience tells me that very few of them ever become rich.

Jump over to software. There are people who believe it is possible to predict the number of bugs in an upcoming release of software based on the number of in the past release. The logic goes like this “We have 5 years of data in the bug system- so we should be able to predict the number of upcoming bugs in the next release. We can even predict how quickly we will be able to fix those bugs based on the time to close the bug.”Does this argument sound familiar? Why is it that people believe this - but don’t believe they can predict the stock market? Or worse they believe this and believe that the stock market can be predicted.

In software I often get asked how long the bug fix cycle of the upcoming release is going to take. I actually have no idea how long it will take and I don’t want to spend my time figuring it out. There are too many variables and too many factors to take into account to make any prediction useful.

The person asking usually appeals to authority or my ego saying “You’ve been around here long enough - you should have an idea” My answer is always - it depends. “Oh yeah - on what?” is the usual response Well - It depends on the number of bugs we find and how long it takes to fix them. Ask development how long it will take them to fix the bugs we find.

“Well we have 5 years of data in the bug database. How long did it take to fix the bugs we’ve already addressed?”

My point is that it doesn’t matter how long it took in the past. What matters is how long it’s going to take to fix the ones in the future.

“Well we can just average out the fix times across releases. Remove the high times and as well those done really quickly so we have a good statistical average.”

Taking the average of fix times defeats the purpose. Why would you remove the ones that took a really long time? Those are the ones you should pay attention to. They also won’t help you in predicting the fix times of the future. Do you really believe that the past data on bugs will help you predict the bugs in the future?

“Well -In the absence of a good measure I’d rather use a weak (read: bad) measure than nothing at all.”

Never give a number to a bureaucrat. Such is the state of (some) software businesses.