Everything Happens At Once: A statistical principle

Image by Arek Socha from Pixabay
This article started in my mind when I was thinking about the Covid-19 situation here in Australia (and elsewhere where the virus has been close to eliminated) but I’ve since broadened and generalized it to some extent.
It began with my imagining a set of random tables to describe someone’s interaction with Covid-19. Such-and-such a chance of catching the disease, modified for various factors. Such and such a chance of mild symptoms, or severe symptoms – with the same modifier as the previous table, and a few more. How to set the values so that they modeled the real-world clusters that emerge, flourish, and burn out – if confronted with appropriate countermeasures to prevent the spread to the next generation of hosts, even if only somewhat effective. That got me thinking about cases in groups, not individually, because its in dealing with groups that statistical analysis thrives. And that’s when things started getting interesting, from the point of view of having something worth sharing in a Campaign Mastery article.
Basics
Above a critical number of cases, all possible outcomes are going to be represented. Below that critical number, RPGs use die rolls to determine which, if any, outcomes are not represented. With me so far?
You would commonly assume that if you had a d100 table, that it would take 100 cases to represent the totality of the possible interpretations or events. And, if the input is a non-random value, sequentially rising by 1 each time, you would be correct; every possible outcome, even a !% chance, would be covered, and covered in direct proportion to the chance of its occurrence.
Reality is a little messier, because the inputs are random – chaotic, not systematic. The reality is that achieving the critical case number simply makes it less likely that there won’t be an unrepresented outcome.
If there is a 1%-likely outcome, the absolute minimum critical number is 100 cases – but even if you had 100 cases to test, distribution of outcomes will be just a little uneven, and so there remains a measurable chance that the low-probability case will be unrepresented.
If the number of cases is 200, it’s a lot less likely that there won’t be one of them in the low probability category. At 300, it’s smaller still. You can actually perform statistical analysis of the statistical analysis.
Chasing A Statistical Tail
If there are 100 rolls, for example, there are 100 chances of the same number to come up on the dice – a 1% per previous roll risk. So, on the first roll, it’s 0% because there’s nothing to compare it to, on the second, it’s 1%, on the third, there are two previous results to compare to, so it’s two percent, and so on. But it’s a complicated situation, because as soon as there IS a match, the number of available rolls for future matches reduces by one thereafter. Add up all those percentages and you can conclude that there is a many-times-higher-than-100% chance of a duplicate result. But the low-percent chances of a match aren’t all that relevant compared to the high-percent chances – even if we don’t know what they are.
You can prove this by contemplating the sum of the three smallest and three biggest chances. Three smallest: 0%, 1%, and 2%, which sums to 3%. Three highest: 99% + 98% + 97% = 294%.
In fact, it’s easy to work out the total:
- Start with 100 results;
- taken 2 at a time, that’s 50 pairs.
- When we pair them, always pair the highest with the lowest, then next highest and next lowest, and so on. 0+99=99. 1+98=99. 2+97=99. 3+96=99. Starting to see a pattern here?
- The big X-factor is what happens around our 50th result, because we can’t have the same chance twice. So let’s start with 48, and work the pattern backwards to get the matching compliment: 99-48=51. So that pair is 48+51=99.
- And the next one is 49+50=99. And that’s all 100 possible values paired up.
- The total of all the chances is 50 pairs x 99 = 4950%.
That’s a meaningless number. Probabilities like this don’t actually sum – instead, you reduce the uncertainty. Probabilities are always a measure of the ignorance of the actual outcome.
Let’s work this out again, doing it properly this time, with ten actual random numbers. I rolled 15, 51, 99, 59, 29, 46, 04, 89, 21, and 05 – so let’s see what happened when I did so.
- With one number rolled, there’s a 1% chance that the second one will be the same thing, and a 99% chance that it won’t.
- As would be expected, the next number is different. With two numbers rolled, there’s now a 2% chance that the third number will match one of the first two, and a 98% chance that it won’t.
- Once again, it doesn’t. And so on, through the fourth, fifth, sixth, seventh, eighth, ninth, and – in this case – tenth numbers.
Okay, so another simple pattern. It’s when I go beyond those ten random results and studying the effects of additional outcomes as though I had rolled them that we get to the interesting answers:
- When it comes to rolling the 11th number, there are ten possible matches out of 100 – a ten percent chance of a match, a 90% chance that there won’t be one. But let’s say that on the 11th roll, we beat the odds and get another 29. This uses up one of our 100 rolls but doesn’t increase the number of possible matches – so, come the twelfth roll, the odds are still 10% chance of a match, and 90% chance of no match.
- Let’s assume that we roll another twenty numbers without another match occurring – the chance of a match goes up by the end of that run by 20 to 30%, or almost one in three, while the chance of no match drops to 70%, not quite two in three.
- That means that by now, one in every three rolls should yield a number that we already have on our list. In fact, we should have had another match by now, statistically speaking.
- Let’s balance things out a bit, and say that of the twenty rolls after that, between 1/2 and 1/3 of them are matches – that’s 6-10, so let’s pick a value in the middle and say 8 of them match.
- That’s eight less chances of getting a unique result, so our chance of a match is now up to 30+(20-8)=42%, and our chance of no match is down to 58%.
- Eight rolls later, if there are no matches, it will be 50-50 – but the odds are that there will be three or four matches, so it won’t quite happen that quickly.
- When it does, the chances of getting a match on the remaining rolls will be more than 50% starting with the next roll to be made – and the chances of not getting a match will be less.
- By the time we get to our last roll out of 100, which the simple model said should have a 99% chance of a match, the actual chance of a match will be 99 minus the number of previous matches, percent. If we’ve had 30 matches, that’s 69% – a big difference.
If 1/3 of our 100 rolls are a match for one of the other 67, that’s 33% of the possible outcomes that are unrepresented. But that number of matches would not happen all that often.
That becomes really significant when it comes to a low-probability result – a 1-in-100 outcome, say. Our ignorance of those 33 possible outcomes dwarfs the likelihood that this one low-probability outcome is numbered amongst the possible results. There’s at least a 1-in-33 chance that our 1% outcome is one of the 33 numbers that weren’t rolled. If anything, intuitively, it seems like that number should be higher – like, 33%..
So, let’s say that we keep rolling until we get to 1% unrepresented. As we rack up the additional rolls, most of them will be duplicates of numbers we already have. Each time we roll another number that we didn’t already have, the chances of the next number being a match for something go up, so it will take longer and longer to scratch those final stubborn numbers off our list. And to get that last number off our list, where there’s a 99% chance of a match, and 1% chance of hitting the target, it could take 99 rolls or more before that one specific number comes up.
Still, the fact remains that at a big enough list of results, every possible outcome will have at least one matching case. With less than that number of cases, we need some sort of discriminating mechanism – our dice – to determine the outcome; but above it, we can simply say that there’s at least one of every class of outcomes.
If there are that critical number of cases to be considered, we can treat the table as though it were the outcomes; the more cases there are, the more closely one will look like the other. That can be an incredibly useful tool for the GM, because it means that we can ignore the chances part of the table and treat it as a list of all the outcomes. We can analyze in generalities and narratives (which we tend to be good at), instead of mathematics (which some of us are not so good at, and which all of us get wrong every now and then).
The chances of a Lich finding a Ring Of Regeneration may be one in 100 – but if there are 600 liches, it is fairly likely that it will happen at least once. So we can ignore the improbability and simply start detailing that particular Lich.
The Size Of The Sample
It’s thus really important to be able to determine the size of that critical number – the point at which individual outcomes are subsumed by the whole, and everything that can possibly happen, does. Unfortunately, this can require really complicated math.
But there are some shortcuts that GMs can use to get their heads around these probabilities and so assess what it most likely to happen, and these can be lifesavers.
If, for example, there are 300 rolls, then (on average) you would expect three rolls of each possible result – which means that you wouldn’t be at all surprised to see two, or four, rolls of any given result, and not all that surprised to see some with five, or six, and some with one – but you are now reaching the point where you would hope to see no result having no cases. If you found that result taking place, though, you would only be disappointed.
When you think about that distribution, you soon realize that you are talking about our familiar old friend, a dumbbell-shaped curve. Flat on top, dropping suddenly through the second and fifth bands of 1/6th results, and fairly flat again at the outer limits.
Given that the end points are always ‘anchored’ at zero, what we need is for the lower-probability ends of the fast-change zones to be higher than one. That means that there is very little risk that a 1%-likely outcome will not be represented with at least one result.
It also means that you can actually treat a subset of the results as a statistical representation of the whole. The larger that subset, the greater the certainty and reliability of the outcome measurements. But if you took the results of any six neighboring results, or any six evenly-numbered results, or – in fact – any six results at all – the outcome-counts in those specific results should map onto that dumbbell-curve.
In fact, this is how political opinion polls and television ratings work – they sample a certain number of opinions and from that, extrapolate to get some idea of the whole. Of course, they can only get a perfect representation if they poll every single viewer / voter – and if the responses are all truthful.
In practice, I don’t think that six is an adequate sample.
Imagine that your dumbbell curve is made of Lego blocks viewed end-on. A count of the number of Lego blocks gives you a measure of the reliability of your analysis of the whole – the more blocks that you have, the more representative the ultimate shape of the curve is.
One block doesn’t do a very good job on its own.
With three blocks, at least we get an indication of sloping walls.
With seven blocks, the shape of the curve begins to be reflected in the arrangement.
Eleven blocks is better again – but there’s still a large void on either side at the top.
With sixteen blocks, we reach a critical point: there’s almost enough space in the voids relative to the size of the bricks that the ‘stack’ can move from the top to a central row. Almost – but not quite.
At 23 blocks, there is ample space to begin reflecting the shape of the top of the curve.
From that point on, the correlation between curve and the shape created by the blocks will only get better, as this 31-block example shows.
The ratio of non-sampled to sampled results appears to give a reliability indicator. If we’re talking 100 results, and a sample of 10, that gives a 100/10=10 unreliability – the same as a sample of 100 from 1000 results.
But uncertainty tends to be assumed to be evenly distributed over the results excluded from the sample, so this isn’t actually correct, and the sample size can be relatively small. A sample of 5000 is quite reasonable to a prediction of 100,000 results, provided that the 5000 is a ‘fair sample’. That’s where the design of political polls becomes an art as much as a science – you have to actually look at the demographics of the samples and adjust them in various ways to correct the match between sample and total results, and try to separate out true trends from statistical anomalies.
Lets put that in terms of the TV ratings, which are (generally) far less controversial – if there’s a survey of 100 households, and all 100 happen to be big fans of golf, the survey will show golf rating its socks off, while other sports languish. But this is a very obvious failure – it’s a lot harder to pick up samples in which only a couple of categories are slightly over-sampled. I would tend to be an outlier on almost every survey – I’m more analytic than most, a deeper planner than most, have interests and hobbies that are usually fairly uncommon, read more, research more – the list goes on and on. I’m more – ‘distinctive’, I think is the best term – and that means that I hardly ever agree with the TV ratings, which hardly ever correspond to what I’m actually watching (I actually think that my particular segment of the audience is under-represented in the surveys, but that’s not the point).
So, getting back to our Lego bricks, a sample of one-in-ten might be perfectly adequate for a set of 200 results, is probably going to be reasonably accurate for a set of 100 results, or 400 results, is not going to be all that good for a set of 50 results (not enough excluded results to carry the weight of the distributed uncertainty) or for a set of 800 results (too many possible results for the sample to be representative). Hey – wait a minute, that’s another dumbbell curve! But this time, the scale on one axis is an exponential one, halving in one direction each step and doubling in the other.
Which brings me to the subject of logarithms. One mathematical trick that I have found very useful in the past is log(a^b) = b x log (a). Another is log-base-c(d) = log (d)/log(c). You can put these together to understand how the uncertainty changes on the dumbbell curve as a result of increasing a sample size relative to the number of results.
But that’s too technical for most people (including me) – and we don’t care, anyway. We can use a simpler approach.
Take a look back at those Lego-block curves. Count the number of rows up to the bottom of the quick-rise part of the curve. We need this to be at least one. The overall number of rows in that curve, divided by the number of rows to the reference point of the curve, tells us the average number of any given result that we need within our sample. And that, multiplied by the number of possible results, gives us the total number of results that we need in order to be sure of getting that 1% – using rough rules of thumb.
By my count, the target gets met with a pattern of 1, 3, 5, 9, a total of 18 samples, and 400 results. So if we have that many or more results, we would expect every possible outcome to be represented, even with the noisy variations in individual results that would normally be seen.
Once you go above that number of outcomes, you can actually treat the statistics of prediction as the statistics of outcome, within a small amount of unreliability.
Generations and Iterations Of Headache – back to Viruses
It’s when we start looking at recurring instances of an event that things get complicated. That’s where, at least some of the results don’t preclude a repeat event a day later, or a week later, or a month later.
Let’s assume that we have a situation in which the possible outcomes are, respectively, 1, 5, 10, 35, and 49% likely. With 400 cases, those are result counts of 4, 20, 40, 140, and 196, plus-or-minus about 50% – so the “10%” column probably contains about 40 results. It might be as low as 20 or as high as 60 – but it probably isn’t; it’s far more likely to be plus-or-minus 4, because some of the ‘errors’ will cancel out. In fact, most of them will, because there are 10 possible chances for them to do so. It’s at the low-probability end where there isn’t enough range of results, within a given outcome, to make that a big enough factor, where the greatest error occurs – to the point where, in the 1% case, the potential variation is plus-or-minus 75%, or from 1-to-7. We know that because we defined the number of results to give us that result.
The least-likely outcome is that you get the 1% outcome twice in a row – that will only happen in 1% of 1% of cases, or 0.01%. The most likely outcome is only 49% of 49% – or 24.01% of cases. And, where we had 5 possible outcomes, we may now have as many as 25. One generation on, that’s 125, then 625, and so on.
At first glance, to get representation of all possible cases, we need to increase the number of cases 400-fold, to 160,000 cases. That means that we get back to the 1-7 results in that 1% of 1% category. But, in fact, we don’t – because 7 results is more than enough opportunity for that error-cancellation. Half that number is probably enough – 80,000 cases.
Ah, if only things were always that simple. What if the 35% case meant that you didn’t have to take part in the next iteration? What if the 5% meant the same thing? And the 1%? But that the 10% meant that the number of cases in the next generation doubles?
Now the makeup of the second generation is defined (in part) by the first.
- We start with 100,000 cases for convenience.
- 49% is 49,000 cases – so that’s 49,000 in the next generation.
- 35% is 35,000 cases – so the next generation stays at 49,000 cases.
- 10% is 10,000 cases – so the next generation doubles – so far, that will be 98,000 cases, plus another 20,000, for 118,000.
- 5% is 5000 cases – so the next generation stays at 118,000 cases.
- 1% is 1000 cases – so the next generation stays at 118,000 cases.
All told, this hypothetical eliminates 35,000 + 5,000 + 1,000 = 41,000 potential cases – but replaces them with 59,000 more.
Things get even more complicated if human behavior is a factor. That 1% outcome of about 1000 might be enough to increase the 49% to 69%, at the expense of 2 of the 10%, 4 of the 5%, and the rest from the 35%. That means that our second generation would have completely different percentage breakdown.
- We start with 118,000 cases, not so convenient.
- 69% is 81,420 cases – so that’s 81,420 in the next generation.
- 21% is 24,780 cases – so the next generation stays at 81,420 cases.
- 8% is 9,440 cases – so the next generation doubles. 81.420 + 9,440 = 90,860 – and double that gets 181,720.
- 1% is 1180 cases – so the next generation stays at 181,720 cases.
- 1% is 1180 cases – so the next generation stays at 181,720 cases.
If the greatest-likelihood outcome is that someone exposed does not fall ill, but remains susceptible, then this is what could happen when people do the right thing because they are scared – things get worse. What if the 10%-means-doubling rule is also affected – what if it becomes, say 1.1x?
This is easy to determine – 1.1×98,800 = 108,680 cases in our third generation.
And that’s down, just a little bit, showing how hard it can be to contain a disease of this hypothetical magnitude. Not even the Coronavirus is this infectious, thank goodness!
There are a huge number of assumptions built into this multi-generational model; change one, and you get very different results in the fourth or fifth generation. Not to mention the third – or the 50th.
If we take a 10-day average (which is about right for Coronavirus) to a generation, the world is now in the 39th or 40th generation in most places – in China it might be the 44th or more.
But the key point here is that by looking at a generation-by-generation model, we don’t need even those 80,000 cases – so long as 1% yields 1 or 2 cases, that’s enough. Maybe 1600 cases in total to create a representative statistical universe – 3200 for statistical rigor.
And, of course, human behavior changes. If the change described above includes taking precautionary measures and lockdowns, after a while, people get complacent (reversing some of the changes), and lockdowns get lifted, and we’re back to the original percentages. That’s how you get multiple waves taking place – and each time, it gets harder to respond with the same effectiveness and determination. And that means that the second wave is bigger and the responses, less effective.
At the back of my mind, when I first started thinking about this, was the thought that it only takes one asymptomatic case to restart the whole thing even in a country where the virus is seemingly under control – and that if it’s possible, and the number of cases is above the critical number, then there will be such an asymptomatic case out there, somewhere. Two generations without a confirmed case is generally considered to be elimination – using a 14-day generation for a comfort margin. But two generations clear won’t be enough to eliminate such asymptomatic spread. Four generations will probably be enough – but even that’s uncertain. Because the total number of cases in this context is the number of people exposed, not the number of people who have tested positive.
Each generation without known transmission increases the likelihood that elimination has taken place, but it doesn’t guarantee it. As soon as the critical number exceeds the population base, we’re out of the realm where the statistics can be treated as a list of outcomes that will take place, and into the realm of uncertainty. And you can roll 00 five times in a row.
One More Example
I feel like I should offer one more example of why this matters to the GM. Fortunately, I have a simple one readily to hand.
Last week, I explained how the PCs in my superhero campaign were shortly to begin a trek through the wilds of Arkansas in search of a new place to call home-when-we’re-in-disguise. So far, my notes cover the entire first game days’ travels for both teams, with 28 targeted stops and 33 drive-through locations. This is roughly 1/3 of the total for the state, and there are 4 other states to follow – though an NPC has proposed that they think about abbreviating Kansas and skipping Nebraska altogether, since some of the team are from tropical climes.
But let’s say that doesn’t happen.
28×3×4 = 336
33×3×4 = 396
Those numbers are both high enough that I can treat the chance of something happening as the fact of something happening – at some point in their trip. Rather than 700-odd rolls to see if “X” happens here, though, I can simply roll for when “X” happens – a d12 rolled a few times will handle this nicely – and schedule the event accordingly. This will be a major prep-time saving, as I start to accelerate the pace of the adventure.
If there are enough cases being tested, everything happens – and within the scope of the entirety of the opportunity, it all happens at once. It’s a useful principle when it comes to bulk… well, bulk anything.
PS: I should probably add that this is the fundamental principle upon which my series on handling large armies is based (like, 10,000 Orcs / Helm’s Deep large) – part one of six here, if anyone’s interested.
Discover more from Campaign Mastery
Subscribe to get the latest posts sent to your email.
Comments Off on Everything Happens At Once: A statistical principle