Monte Python Simulation: misunderstanding Monte Carlo

I recently found myself in yet another circular Twitter discussion of estimation, in which the One True Way to scope work in uncertainty ranged from entirely abandoning estimation to applying formal Cost Accounting methods and nothing less would suffice. I’ve talked about this at length and I will happily excise any comments that get into #noestimates.

One topic that came up from the Cost Accounting camp was the use of numerical methods as estimation tools, in particular Monte Carlo Simulation. I questioned the method’s applicability in this case and ended up a side conversation with the lovely Troy Magennis, who builds open source statistical modelling tools for a hobby. He asked me to elaborate on this, and in particular about where I think Monte Carlo is useful and where it is misapplied.

Here is the relevant part of the Twitter conversation for context, with emphasis added:

Him: Yes, and you need to define the procession and accuracy “needed” for the estimate to be useful before starting the effort to produce the estimate. A ROM may be all that is needed ±10X
An 80% confidence of “on of before” may be needed before starting work
Credible estimating is a continuous process of refining the estimate with actual data, updates to the model that produced the estimate, and corrective and preventive actions from Continuous Risk Management.

Me: Credible estimating is doing the bare minimum you can get away with to materially support the decisions you are trying to make. Any more than that is just work creation for project managers.
”80% confidence of on-or-before” is a meaningless term for a single project. It means “If we carry out this exact project a statistically significant number of times (>20 say) then 80% of those will be within this date.” But we will carry it out exactly once. Ever. Statistics matters.

Him: Most certainly not. This is “proposal submittal” criteria @ NASA.
Number is risk informed (reducible & Irreducible) from Monte Carlo simulation using Reference Class Forecasting of previous programs describe in the “Past Performance Section” of the proposal
Yes, informed by “value at risk”
You’ve got \$10k at risk, bare minimum is much different than [when] you’ve got \$10B at risk. Context and domain needed before any platitudes are useful.

Me: and any Monte Carlo simulation has parameters that are guesses, with probability distributions that are more guesses. The more times you do something the more likely you are to choose good parameters and curves, but it takes time and is expensive.

So here goes, but first a story.

Statistics for Microbiologists

I did a degree in Mathematics with Computer Science about 100 years ago, or more accurately in the late ‘80s, and I remember my surprise at how badly they were teaching statistics in non-mathematical subjects. I had a friend doing Microbiology, and the rule was that they had to pass the Statistics module at some point during their degree course. If they failed the module in the first year they would repeat it in the second year, and so on until they passed.

My friend Mary was a smart girl and a good student, and she kept failing this stats module. She was convincing herself she was useless at maths and would never make it as a microbiologist. I asked to see the past papers and they didn’t look that difficult, so I asked her to show me her stats text book. Its title was “Statistics for Microbiologists” and my immediate thought was “Why would microbiologists need their own statistics book? Statistics is statistics!” Unsurprisingly the author was one of the microbiology lecturers, who would make a few royalty bucks each academic year by inflicting their book on all their students.

And it was shockingly bad! It didn’t make any sense, and was clearly written by someone who didn’t understand statistics. To misquote Pauli, it wasn’t just bad, it was not even wrong. To add insult to injury, the faculty considered teaching statistics as a short straw, so some junior lecturer or other would, reluctantly, drone through the content week after week, and would dread seeing the same faces back the next year after they had once again flunked both the module and the retakes.

So I offered to teach Mary statistics in return for coffee and chocolate—they were simpler times—and I remember her response as she sliced her way through one past paper after another. “Is that it? Is that all there is to it?” Part incredulous, part furious at how much of her time and energy she had wasted in these pointless lectures.

Pretty soon I was hosting a stats tutorial for all her microbiology pals, and sure enough they all stormed the stats module too. I’m not telling this story to tell you how great a stats teacher I am, but to suggest the standard of teaching stats was so poor outside of the Maths faculty that even I could do a better job (based on a sample of one college, oh the hypocrisy), which may go some way to explaining how it is so often poorly misapplied. I’m sure you’ve seen the many articles talking about how we get Bayesian statistics just as wrong.

So then, on to Monte Carlo. I’ve seen an increase recently in people saying they want to use a Monte Carlo simulation in order to estimate likely project length, or more specifically to “define a 90% confidence level” for a project length.

We are using the wrong tool

Monte Carlo predicts a probability distribution for a number of future trials. We are using it to estimate the result of a single trial.

Monte Carlo is a group of methods for modelling a probability distribution for a given type of event, where that event is controlled by a number of independent parameters. Say you want to decide the location for a new distribution warehouse. You want to site it such that you can be confident 90% of deliveries will be on time. You might use a Monte Carlo simulation to model the distribution of delivery times based on a number of parameters, and use this model to assess which of a number of possible locations you should choose. Once the warehouse is up and running you can build a distribution curve based on real delivery times, and replace your theoretical model with an empirical one.

Monte Carlo modelling allows us to build a theoretical model of the distribution of a set of similar events where it would be impractical to try to build an empirical model. You can use this to build models like the example above, where each event is a delivery that may or may not be on time.

You define a function of several parameters, each of which has its own probability distribution, and use this to carry out a number of simulations. For each simulation you take a random value of each parameter based on its probably distribution, and use that set of values in the Monte Carlo function to derive a sample result. You then build a histogram of these results, and this histogram represents the probability distribution of the event you are modelling.

90% confident, or confident 90% of the time?

For a single event the interpretation is a bit different. A single sample is what it is, or rather what it will be, and for any event with uncertainty we can’t know beforehand what the answer will be on that occasion. You can’t know whether any one delivery will be late.

You could use the probability distribution to price an insurance policy or financial option for that event, which is the principle behind Black-Scholes. In other words you could bet against yourself and use that to hedge potential failure. With Black-Scholes you have a model of how you think the value of a financial instrument behaves over time, and you use this to price an option, which is the right but not the obligation to transact at some agreed point in the future. The value of the option varies over time based on the observed values of the parameters, including how much time remains.

The choice to exercise an option is like a single bet: you will make or lose money on it. Options trading works because you make lots of these bets, and your wins and losses balance out over time, ideally in your favour if your models are any good. Likewise, if you can produce a model of the likelihood of success of software projects, and you were to bet on lots of these projects over time, and they conformed sufficiently well to your model, you could be reasonably confident of success across the entire portfolio of projects over time. But this offers no guarantees about any single project.

As an example, look at surgery success rates. A surgeon will have an outcome histogram over time for a particular procedure. They can tell you the likelihood of various outcomes based on observed results, and ideally based on their own personal success rates. For the surgeon this is a distribution. For you, all you care about is this one procedure. Your outcome is binary—you live or you die—and the data can only tell you what the odds look like over a statistically significant sample size. I see this is as a false sense of security, because as humans we can’t make a value judgement between 90% and 95%, or between 95% and 97%. We can probably decide between say 65% and 75%, but then how do you interpret your own appetite for risk if someone says there is a 2-in-3 chance of success vs a 3-in-4 chance?

A single software project will run exactly once. Even if you run the same project again with the same people, things will be different. The people will have changed, the organisation will have moved on, the context will be different. “You never cross the same river twice.

We are using the tool wrong

Choosing valid inputs for a Monte Carlo simulation is hard!

The Monte Carlo function comprises a number of assumptions, which might be empirically derived or might just be guesses depending on the information available. Choosing significant independent parameters, understanding their distributions, and defining a function to predict the behaviour of a complex adaptive system, are all hard, and all differently hard, and all differently prone to errors and biases.

There are three types of assumptions:

  1. We assume a function exists based on a number of independent parameters which can represent values of the event.
  2. We assume:

– we know a set of parameters that completely describe this function
– that these parameters are independent, and
– we know how to calculate the values (i.e. that we can define the function).
3. We assume we know the probability distribution of each of these parameters, so we can pick random values for each one and be confident that the value represents a realistic sample for that parameter.

Given all these assumptions, you can carry out a number of simulations. For each simulation you take a random set of values and plug them into your Monte Carlo function to derive a sample result, and you do this again and again to build your histogram.

This approach breaks down if any of the assumptions are incorrect. Specifically:

  1. if there isn’t a function of several parameters, or if there is a relationship between any of the parameters that we don’t understand, such that they aren’t independent.
  2. if we don’t identify all the parameters that describe the event, either by over- or under-specifying them.
  3. if we choose the wrong probability distribution for any of the parameters, so we choose “random” values that don’t represent that parameter.

We are asking the wrong question

“On time and on budget” is meaningless without an indication of value.

Incidental to all this, although just as important, is what question we are modelling when we use a Monte Carlo simulation. We typically care about whether the project will be on time and on budget, and the PMO and other steering folk track this ruthlessly. Although these numbers are necessary in a cost-accounting world, there is more value in knowing time to initial business impact, and modelling the business impact curve. Ideally we should be modelling Cost of Delay and Risk-Adjusted Return on Capital as the genuine economic indicators of the impact of our work, but these often lag too far behind to be a useful indicator.

For instance we could hypothesise that by doing some work we could simplify a business process (this is the business impact) and that the simpler process would need fewer people, which would then lower our operating costs (the consequent business value). We can’t directly deliver the business value, but we can deliver business impact and track our hypothesis about business value over time.

Legitimate use of Monte Carlo Simulation

This doesn’t mean you can’t or shouldn’t use Monte Carlo in software development. I have seen a number of situations where people are benefiting from the method.

Example 1: Predicting delivery of features

Teams often work from a backlog of features or other work items and track these using a tracking tool like Jira. Some of these teams have been using Monte Carlo simulations to predict throughput of features based on historical data.

A number of conditions are necessary for this to be valid:

  • The past work should be representative of the future work. If the last few months has been about adding new features and the next few months is about integrating with other systems, the data is unlikely to be representative.
  • The future delivery context will not change significantly compared to the recent context. If the team is changing or you have yet another transformation initiative or re-org rolling out, this is likely to affect the delivery histogram.
  • The Monte Carlo function should be a reasonable indicator of the past work. It is easy to define a Monte Carlo function. Identifying the parameters that genuinely capture the behaviour, and using them to define a function that reasonably represents the historical and future data, is hard.
  • The team understands the probability distribution of each of the Monte Carlo function’s input parameters. Even if you can identify the factors that affect the rate of delivery, their respective probability distributions might not be obvious, which means each “random” value you get for a Monte Carlo trial may not produce a representative result. This means the histogram won’t model reality.

On one programme of around 12 or so teams, several of the teams were using this kind of Monte Carlo analysis to model their anticipated delivery rate for their Product Management team. This allowed the product managers to plan a product roadmap and know when to engage external stakeholders. Over time the team and the product managers grew to trust these models and integrated them into their day-to-day product strategy.

Example 2: Using Monte Carlo to explore alternatives

For a single point project you can use Monte Carlo simulation to explore how different assumptions about the input parameters affect the bigger picture, an activity called Sensitivity Analysis. You can ask questions like “If we can constrain the likely values of this parameter in this way, what impact will that have on the likely result?” This kind of analysis can also uncover unexpected correlation between variables you thought were independent.

Say one of the parameters was the delivery date of a key dependency. This is bounded below by the earliest possible delivery date, but theoretically unbounded above (those pesky vendors!). You might model the parameter as a skewed normal distribution, or an exponentially decaying curve, or some other function that fits the general description. You plug in the probability distribution function for the parameter, run the simulation, and get a histogram plot for the project’s likely end date.

What if you could do something with the vendor to change that curve, by say imposing a penalty or reward? This might make the bell much narrower and the long tail much flatter, which would represent a higher likelihood of the vendor delivering close to the promised date (technically a smaller standard deviation). You would then re-run the Monte Carlo simulation and see how this affects the result.

You can experiment with the controlling functions for the parameters in this way to see the impact on the wider model of changing the assumptions for particular parameters. This tells you which changes might give the highest leverage, and conversely which things aren’t worth going after, even if they are tempting. One set of assumptions might give you a narrower standard deviation in the Monte Carlo histogram for the project delivery date, so you would have more confidence the project would be within a given window under these assumptions. Another might bring forward the earliest date, but fatten the tail as well. This would mean you have made it possible to deliver sooner but in a way that increases the likelihood of being late.

Conclusion

Monte Carlo modelling can be a powerful tool in situations where it is impractical or uneconomical to build an empirical model. It builds a histogram that represents a likely distribution of future samples or trials.

It doesn’t make sense to build a Monte Carlo model for a single trial, such as a software project or a single feature, unless you have a valid reason to want a specific confidence level, and the ability to discern between, say, 84% and 93%, which in most cases you don’t. There are still cases where Monte Carlo simulation is useful for software development, such as predicting feature throughput for a statistically significant number of future features, or exploring how changes to assumptions of the control variables affect the resulting distribution.

So don’t be seduced by statistical simulations, manipulated by mathematical models, or otherwise blinded by science theatre, and learn to identify the Monte Python circus.

15 comments

  1. […] Monte Python Simulation: misunderstanding Monte Carlo (Dan North) […]

  2. The idea that 80% confidence is meaningless for a single project requires a decidedly frequentist approach to probability. But, of course, there are other interpretations of that phrase, for instance if we take an epistemic view of probability, then that statement means “based on the best knowledge that we have, we believe that that statement has an 80% chance of being true”, the primary difference being that, in one case we are talking about relative counts of events, whereas in the other we are talking about our state of knowledge. In an epistemic view of probability, there is no need to assume that some event is a selection from a population, just that we have incomplete knowledge about that event.

    1. The 80% confidence of “on or before a date” or 80% confidence of “at of below” a cost or some other Technical Performance Measure is EXACTLY what MCS is used for.
      See my longer response below
      In many domains, including agile SW development, MCS is mandated

      Dan has mixed the underlying processes of Biology with the process of project activities behaving randomly in a network of interdependent tasks, then wrongly concluded MCS is not appropriate for project work

      MCS is actually mandated in many domains
      https://goo.gl/FQ4ow8
      https://goo.gl/sA1pxb
      https://goo.gl/bN9Huz

      and many others. Google “cost and schedule analysis monte carlo simulation”

  3. Michael Boumansour · ·

    Dan, I enjoyed your post. I am also a big fan of Troy Magennis’ work. As a recovering veteran of countless estimation efforts I have found MS and other statistical methods to be a far superior approach to the old school effort based estimates centered around a list of requirements. No method is without its flaws, but in my experience MS gets you much closer to the vicinity of the actual time and cost than effort based techniques. Regardless of method I have found the problem to be more with the thinking that an estimate/forecast is a one-off guarantee or committment rather than a continuous best guess based on the information and tools available at a given point and time. That kind of thinking gets everyone involved in trouble.

    As I was reading along I was disagreeing with you in my head that MS is not an appropriate tool for forecasting a software project until I reached the part where you said MS can be effective for forecasting delivery of features. I have been doing Agile and Lean for so long that I think of any software project as just a series of feature deliveries that we have drawn a circle around for accounting purposes. For teams operating in that manner along with your aforementioned criteria I think MS is very effective. I suppose in the case of teams doing more waterfall like development it MS would not make much sense. Great perspective!
    Cheers!
    Mike Boumansour

  4. llemirtrauts · ·

    It was a touch unclear to me at first whether “Him” in the conversation at the top is Troy or one of the “Cost Accounting Camp”.

    For those equally as confusable as me…it ain’t Troy.

    Great article. Thanks Dan.

  5. Dan, I enjoyed your post. I am also a big fan of Troy Magennis’ work. As a recovering veteran of countless estimation efforts I have found MS and other statistical methods to be a far superior approach to the old school effort based estimates centered around a list of requirements. No method is without its flaws, but in my experience MS gets you much closer to the vicinity of the actual time and cost than effort based techniques. Regardless of method I have found the problem to be more with the thinking that an estimate/forecast is a one-off guarantee or committment rather than a continuous best guess based on the information and tools available at a given point and time. That kind of thinking gets everyone involved in trouble.

    As I was reading along I was disagreeing with you in my head that MS is not an appropriate tool for forecasting a software project until I reached the part where you said MS can be effective for forecasting delivery of features. I have been doing Agile and Lean for so long that I think of any software project as just a series of feature deliveries that we have drawn a circle around for accounting purposes. For teams operating in that manner along with your aforementioned criteria I think MS is very effective. I suppose in the case of teams doing more waterfall like development it MS would not make much sense. Great perspective!
    Cheers!
    Mike Boumansour

    1. There are Plug-Ins for Jira, Rally, and Version One that will Forecast (an Estimate of an outcome in the Future) using Past performance of the starting with Tasks and Stories flowing up to Capabilities in the Release Plan
      Dan has mixed the underlying process of Biology with the processes of project work and wrongly concluded MCS is not appropriate on projects.
      See my response and links to tools, processes, and principles of applying MCS to SW development and other project domains

  6. On of the issues I’ve observed is with teams who start using MC for predicting feature delivery, but then start treating the prediction as a target. The resulting outcome is…. exactly what you’d predict ;)

    1. That would be a Mis Use of the tool and the principles of estimating using any process
      NOT the fault of the principles or the tools

  7. Dan,

    The MCS modeling of biological systems is not the same as the modeling of a network of activities in a project, even in an agile project, where the network of activities is raised to the Feature and Capabilities level in the Product Roadmap and Release Plan.

    The underlying processes of “Monte Carlo Simulation” the Latin Hypercube sampling, for example, has many application domains. Here are a few samples from our domain(s)

    https://goo.gl/urF8Z9
    https://goo.gl/KDyNaa
    https://goo.gl/rqo4aL

    But the underlying sampling processes are the same using the Lurie-Goldberg algorithm

    https://goo.gl/vgvPNN
    https://goo.gl/BKLDwZ

    Your statements that MCS is not applicable to projects is not correct
    You’re taking the Biology domain and applying it to the project network of activities domain

    Finally, the MCS tools we use

    https://goo.gl/RxgXKA
    https://goo.gl/xPykqG
    https://goo.gl/dQkMQD

    all are applicable to project work

    These tools make use of “past performance” to define the ranges of samples for each “variable under consideration” OR they can use parametric models to define those ranges

    Here’s a sample of the guidance to perform MCS in a wide range of project domains from design analysis to cost and schedule modeling

    https://goo.gl/bDDyvU
    https://goo.gl/xQdG2p

    With conferences dedicated to the topic with lots of Agile topics

    https://goo.gl/v1FW5Y

    So I’d ask that you expand your Literature Search a bit further before making claims that simply aren’t applicable outside the biology domain to see what you’ve stated is incorrect for the project domain

    1. I’m applying my understanding of Monte Carlo as a statistical tool. “Monte Carlo for projects” is as much a red herring as “Statistics for Microbiologists.”

  8. Your example in the article started with biological systems. No red herring, just wrong domain

    Did you read the materials in the links

    You claim “Monte Carlo predicts a probability distribution for a number of future trials. We are using it to estimate the result of a single trial.”

    Please read materials for how MCS is used to “predict” outcomes for a single project activity

    When you say “It doesn’t make sense to build a Monte Carlo model for a single trial, such as a software project or a single feature, unless you have a valid reason to want a specific confidence level, and the ability to discern between, say, 84% and 93%, which in most cases you don’t.” This is not supported by principles, practices, processes, and evidence of applying MCS to software development projects including Agile SW development projects.

    If you’d like to learn how MCS is used in a wide variety of SW domains where we work, I’d be glad to share our processes and results

    1. “Your example in the article started with biological systems. No red herring, just wrong domain”

      No it didn’t. It started with an anecdote about Microbiology. I’m not going to read a mountain of self-justifying literature, sorry. If you have a brief summary and an argument other than “We do it because we have to do it.” (Which is what your “It is mandated” argument boils down to) then let’s explore that. Otherwise please don’t just blast my comments with links. Thanks.

  9. Thomas Zheng · ·

    I suspect there is an existing topic in statistics that might help elucidate the subject: Ergodicity. It’s a property in stochastic process, which says that time probability is equal to ensemble probability when the process is ergodic. Of course, to test for ergodicity is hard. Anyway, probability alone will not help us make good decisions, only people know what the reward and costs are for each singular decision. But if we plan to make a million decisions really fast and we know how to manage the cost when the decision has a bad outcome, then yeah, use MC or any other method will make more sense.

  10. […] Monte Python Simulation: misunderstanding Monte Carlo […]

%d bloggers like this: