Guesstimates in RPGs: Measuring Handwavia

A good guesstimate is like a good sketch – you’d never mistake it for the real thing, but it still tells you more-or-less what you need to know about this specific example of the general subject. Image: Pencil sketch and watercolor by Guy MOLL from Faro, Portugal, used under the terms of CC BY 2.0, via Wikimedia Commons. Top version: reduced in size to show the whole image; bottom version: cropped to show detail.
The subject today is Approximations and Guesstimations in RPGs.
I’ve got a number of article ideas in various stages of development, intended to break up the series on Economics in RPGs. When the time comes to select between them, one of the key parameters that has to be assessed is how long the article will take to write, relative to the amount of time available.
Criteria: Enthusiasm
If I were to choose on the basis of what I most feel like writing – also a valid criterion – I would probably have chosen to write about a new game mechanic that I thought up a few weeks ago. But that article will need quite a lot of time and effort to finish, though the concepts are quite clear.
Criteria: Preparedness
The choice could be based on the level of structure and organization that has been done in advance, I would be writing an article on a source of plot ideas that pre-dates that game mechanic by a fair period. It’s been waiting around while I look for a third example, because I didn’t think the two that have already come to me were sufficient.
Criteria: Forgetfulness
If I were to choose based on how much of the concept behind the article was beginning to slip away, crowded out by more recent focuses of activity, then it would be an article on diseases that was inspired by something one of my players said, a month or two back.
Most of the content is still clear to me, but a couple of key details are becoming vague. If this rot proceeds too far, it can lead to the article being abandoned completely; I have a couple of other articles that fall into that category – one on Rumors and another on GM Decision-making.
Criteria: Clarity
Another approach would be to choose the idea that seems clearest to me at the moment – which would probably be a short one on the utility of whiteboards; it’s clearest because the idea only came to me this weekend. Or I could spin something off-the-cuff about the plotting difficulties of the adventure that’s currently being worked on for the Adventurer’s Club because it’s more of a sand-boxed concept than most of the more structured plots that I create.
Criteria: Writing Time
Both the game-mechanics article and disease article fall foul of the writing time limit; I don’t think either would be ready by the time deadline came around. By the time I factor in the lack of enthusiasm and the impact on my speed of writing, the plot mechanics article also begins to look a little dubious. The others are either abandoned (at least temporarily) or look okay on that front, but – being new to the queue – they also have a lack of urgency.
Assessing this factor requires practice at guesstimation and hand-waving, and since that’s what this article is about (as mentioned earlier), the stars seemed to align and the choice was made.
An educated Guesstimate
The more structured and prepped an article is, the more reliably the writing time can be guesstimated.
The Game Mechanics Article
The Game Mechanics article has 86 planned sections – some only a paragraph or two long, others involving a lot of statistical work behind the scenes. If I figure an average of 250 words to most of those sections, that’s an estimated 21,500 words.
In a previous article, Lightning Research: Maximum Answers in Minimum Time I think it was, I estimated that stream-of-consciousness writing – like what I’m using for this article – I could get through an average of about 1000 words an hour, more on a good day.
I started writing last week’s article on Economics in RPGs: The Age of Steam at 9:30 AM and finished it at 12:30 AM – call it 15 hours – and it came close to 14,000 words, so the average holds up fairly well.
So that estimates a best-case situation of 21 hours of writing – all of Sunday and Monday.
But if there’s significant amounts of research or layout challenges like tables or bespoke illustrations, that average goes down – way down. It halves for each of those factors. While that won’t affect every one of those 86 sections, it will affect enough of them that parts of the article will be written at 250 words per hour, maybe less. That adds 3 hours for each of these sections – so figure 8 of them times 3 additional hours, and that’s another 24 hours of writing, for a total of 49 hours. Call it an even fifty hours.
But note this: I didn’t have to calculate this when eyeballing what was going to be possible for today’s article – one glance at what was intended was enough to rule it out. Practicality probably means that I’m going to have to break it up, maybe into three or four parts – and I don’t want to start another series of that length until the Economics series is done.
The Disease Article
This doesn’t suffer from the same handicaps. The outline consists of 36 sections, and there isn’t a whole lot in the way of research / illustration / layout to eat into the 1,000 words an hour estimate.
On the other hand, I have a suspicion that the average section length might be a little greater – call it 300 words on average – so that’s an estimated 10,800 words and 11 hours writing time. But I have to make allowances for the fuzziness factor – figure a couple of extra hours groping around as a result. Thirteen hours is about three too many.
Again, I didn’t calculate this at the time – it was enough to simply eyeball the breakdown and get an uncomfortable feeling about getting it finished in the time available, and that was enough to take it off the table.
Skill & Experience
In effect, what I was doing was eyeballing the proposed articles and utilizing my more-than-ten-years experience to guesstimate the answer to a simple yes-no proposition: “Was I confident of getting the article done in time?”
That poses significant question – one that is the core subject of this article – and begins to elaborate on a “how” back-end to an answer : How accurate should guesstimates be in an RPG?
Guesstimate Standards Of Accuracy
With this as a starting point, I can define a couple of yardsticks for the accuracy of a guesstimate.
- Binary
- Within 50%
- Within 20%
- Within 10%
- Within 5%
Binary
Yes / No.
Black / White.
Too much / not enough.
It’s Doable / It’s not practical.
Simple binary assessments are the broadest of the lot, and the easiest to make accurately. They completely ignore any gray fuzziness about the middle – any fuzziness gets relegated as potentially falling into one of the two categories, and is then judged as though that were the projected outcome. So there’s no ambiguity.
Fuzz?
Nor is there any accommodation for fuzzy-making ifs and buts. You either assume a best case, a worst case, or a somewhere-in-the-middle case, and make a hard call based on that assumption.
In fact, you can go further – on any project lasting more than a day, there’s the potential for something to go wrong along the way; on any project lasting more than a week, there’s a fair likelihood of that happening; and on any project lasting more than a year, it’s a near-certainty. Allow for multiple people working on the project and make it man-days, man-weeks, and man-years (with apologies to female readers).
So you can simply assume best case, somewhere-in-the-middle, and worst-case, respectively, throw in a fudge factor to overcome the risk, and eliminate the fuzz.
Within 50%
The next order of reliability is pretty vague, but its’ the first one that gives an answer to a “how many” or “how much” question.
If the correct answer (not known at the time) was “10”, this level of estimate reliability is “between 5 and 20”. You’re almost certain to be correct, but the guesstimate doesn’t have a lot of precision.
In fact, the precision has been sacrificed to obtain reliability; the range is so broad that almost any combination of “if” or “but” can be accommodated; the specific events along the way just steer the outcome toward one extreme or the other, or – more probably – both, more-or-less canceling each other out.
This is like gambling that you won’t roll a 3, 4, 5, 16, 17, or 18 on 3d6. Yes, it will happen from time to time – but most of the time, this would be a pretty safe bet.
Within 20%
There’s a significant increase in accuracy when you go from ±50% to ±20%. Again using an actual result of “10”, this is predicting 8-12. Depending on the circumstances, you might then target the low estimate (8) or the high (12) with your planning.
This is actually as close as realistic guesstimates are likely to get; even if you aimed for the next highest accuracy bracket, accommodating fuzziness and reverses of fortune by selecting the higher end of the 20% range costs so little and gains so much benefit that it’s common practice.
Within 10%
To get to within 10% accuracy, you are normally obliged to go beyond guesstimating to a more formal estimations process – the equivalent of what I did when analyzing those two articles for expected completion time requirements. This is essentially a more rigorous and formal guess, and is likely to be inaccurate because of good luck or bad luck as often as it is correct.
The larger a planned project, though, the more likely it is that changes in fortune (good or ill) will happen often enough to enter the realm of predictable statistics – and that permits formal estimates to incorporate allowances for these events. Those allowances, in turn, are what enable estimates to achieve this level of accuracy.
Nevertheless, in an RPG, it’s not impossible for characters to be able to think fast enough that they could apply such an estimating regime “off the cuff”, without even thinking about it – and that, in anyone else’s language, is simply a more accurate guesstimate.
Within 5%
If 10% accuracy is achieved by breaking a task down into smaller units that can be more accurately forecast, plus making allowances for setbacks along the way, then the logical next step is to apply a formal estimating process to each of those smaller units, breaking them down still further into sub-units if necessary.
This level of accuracy also generally means that a general number plucked out of the air is no longer good enough; that’s what I meant by applying a ‘formal estimating process’. You might, for example, apply formal statistics and industry standards for key parts of the process. Still more likely is a commitment to deploying additional resources as necessary to prevent (or try to prevent) estimate variances greater than this target.
Which means that what you really have is a 10% estimate, but a promise to work harder if you look like falling short by more than half of that, which essentially guarantees hitting that 10% mark, no matter what happens.
You can’t achieve this level of estimation without a relevant skill; but if a character posses such a skill, the same logic given in the previous section comes into play. Most characters will need a skill roll and either a very good result or overcoming a significant penalty in order to get an estimate on this scale of accuracy out of thin air. Only the rare super-genius with relevant skill, can hope to do so routinely.
Confidence In Guesstimates
The above standards all skirt around the question of reliability of the guesstimate – which can be interpreted more usefully as the level of confidence that a character can have in an estimate. What might initially appear to be a relatively simple function of skill and desired / required accuracy gets complicated somewhat by changing the techniques used to generate the estimate.
I’m going to simplify the problem by separating the two, then getting formal estimates out of the way as simply and quickly as I can.
- Succeed by 1 / Succeed with a modifier of -1 / -5% = Binary with 90% confidence
- Succeed by 2 / Succeed with a modifier of -2 / -10% = 50% accuracy with 75% confidence
- Succeed by 3 / Succeed with a modifier of -3 / -15% = 20% accuracy with 60% confidence
- Succeed by 4 / Succeed with a modifier of -4 / -20% = 10% accuracy with 50% confidence
- Succeed by 5 / Succeed with a modifier of -5 / -25% = 5% accuracy with 40% confidence
- +10% confidence for each additional point of success or each additional -1 /-5% modifier
Formal Estimates
I’m further going to simplify the proposition by assuming that any additional rigor of process is assumed.
This works the problem three different ways for three different types of game system. Which one you use depends on the circumstances of the roll, with the basic mechanics of the system being a secondary consideration.
Succeed by x – you have a fixed skill target. Depending on the game system, you might need to roll more than this target or less than it. The difference between the actual result and what you needed defines the ‘quality of success’, i.e. how much you succeeded by.
For example, Target number 14 or better on d20; actually roll a 17; 17-14=3; so this is ‘success by 3’.
2nd example: Target number 11 or less on 3d6; actually roll a 9; 11-9=2; so this is ‘success by 2’.
Succeed with a modifier of x – means that you are adjusting the skill target to try to achieve a specific desired target. Failure doesn’t mean that you haven’t produced a successful guesstimate, just that it is either less accurate or less reliable than you wanted. Simply go up the table the number of points or 5% increments by which you failed to get the level actually achieved.
3rd example: Target number is 60% or less on d%; actual roll is 37; 60-37=23; so this is a ‘success by 23%, which isn’t enough for ‘success by 25%. It’s a “succeed by 4’ result.
Use this type of roll when a character wants to make a formal estimate of something. The character should announce, before they roll, what their desired accuracy or confidence level is (they can’t specify both, the other one is determined by the die roll).
For example, a character succeeds by 5, having specified a 20% accuracy target. Achieving that standard of accuracy requires success by 3, and a base 60% confidence. That leaves 2 levels of additional success to be reflected in additional confidence, which is +20%, so the GM can provide a fairly close estimate and specify that the character is 80% sure that the end result will be within 20% of that estimate.
Example 2: Perhaps the character has said that he wants to be 100% confident in his estimate, even if that means the estimate is less precise. Same rolls and level of success. Start with the ‘success by 1’ category; base confidence 90%, so getting that to 100% would use only 1 more of the achieved success level. So, move on to the next level of result, 50% accuracy. Base confidence is 75%, and two levels of success are used in achieving that accuracy. Three more are needed to get to 100% confidence, and that uses up all five. So the character can be 100% confident of his ±50% estimate. Any higher on the accuracy list won’t leave enough levels of success to get to the 100% confidence (which is another way of saying that getting to 100% confidence doesn’t leave enough levels of success for the character to actually succeed at achieving higher accuracy). So the result is 50% accuracy at 100% confidence.
Okay, that’s the bare bones of a functional system for formal estimates, that assumes that the character is doing whatever is necessary to achieve the accuracy and reliability of estimation. Good enough – so let’s move on to the more interesting question of guesstimates.
- Binary, rank 1: 12 – 2 = 10 /10. Perfect reliability, complete confidence.
- 50% accuracy, rank 2: 12-4 = 8/10. 80% confident.
- 20% accuracy, rank 3: 12-6 = 6/10. 60% confident.
- 10% accuracy, rank 5: 12-10 = 2/10. 20% confident. This skill level doesn’t really support this level of accuracy in a guesstimate.
- 5% accuracy, rank 7: 12-14= -2/10, 0% confident. That confirms the previous assessment.
Guesstimates
With guesstimates, it’s a fairly simple proposition: the greater the margin of error you allow, the more reliable a guesstimate will be.
A Realistic Approach?
The more realistic option would be to multiply the reliability and the confidence together to get the skill level of the character, written as a percentage of success. So if you had a 70% chance of success, whether that’s from 7 or better on d20 or 12 or less on 3d6 or whatever, your calculation would be:
Accuracy /100 × Reliability (%) = 70,
or, more usefully,
70 × 100 / Accuracy = Reliability
But what is “Binary”? it’s not 100%, and it’s not 50%. Realism, it seems, has functional limits in playable game mechanics – what a shocker!
Functional
Okay, so let’s go for something that’s more functional and less realistic as necessary, i.e. more abstract.
We can start by counting each level of accuracy as a ‘rank’ or ‘tier’ of results. That immediately kills the ‘binary’ problem, but replacing it with an abstract value.
Next problem: should ‘rank 1’ be the best possible result (5%) or should it be the entry-level ‘binary’ result? Well, let’s work on the mechanics and see what would be more convenient:
Skill Success (d20/3d6) – 2 × Accuracy = reliability (out of 10)
That looks like it should work and shouldn’t be too big a problem.
Now, if “binary” is a low rank number, reliability for a given skill level will be high, and each step up the accuracy ladder produces a less reliable result. That’s exactly what we want.
The alternative has reliability going up with increased accuracy, i.e. smaller fudge-factor – which is completely wrong.
So, “Binary” = rank 1; 50% is rank 2; 20% is rank 3; 10% is rank 4; and 5% is rank 5. But, since 10% and 5% aren’t normally available for guesstimates, unless you are exceptional (and hence are likely to have an exceptional skill level), let’s impose some additional difficulty: 10% is rank 5, and 5% is rank 7. There are no rank 4 or 6 results.
Example: So, for a skill of 12 or less required, we get
12 – 2 × Accuracy = Reliability out of 10.
The same results would be produced if the goal was “eight or more on d20”, or “60% or less on d%”.
The shape of failure
Now, these are the results that can be expected from a successful skill check – no penalty levels or anything else, a straightforward succeed or fail.
Which raises the question, what does a failure look like? After all, even on a failure, a guesstimate should produce a number, however inaccurate and unreliable it might be.
How about this: on a failure, the actual result is as dictated by the next lowest rank (but the character doesn’t realize it) and the margin of failure subtracts 10% off the resulting confidence level per point.
Example: A character rolls to attempt a 20% accurate estimate; needing 13/-, he rolls a 16, and fails by three. The GM delivers an estimate that is somewhere in the 50% range (either high or low) but not the 20% range, and advises the character that he has only a 13 – 6 = 7 out of 10, less three for the failure, = 4 out of 10 = 40% confidence in the result – which will eventually prove to be a significant over- or under-estimate.
Subjects Of Guesstimates
Guesstimates and Estimates are useful for the GM because they permit him to generate an approximation for his own use, on the fly, if one is needed. There are all sorts of values that may need to be guesstimated in this way, and they all have their own unique foibles that should be used to tweak the general accuracy values that have been used to date.
- Telescoping effect: People tend to recall recent events as occurring further back in time than they actually did (backward telescoping) and distant events as occurring more recently than they actually did (forward telescoping).
- Vierordt’s law: Shorter intervals tend to be overestimated while longer intervals tend to be underestimated.
- Time intervals associated with more changes may be perceived as longer than intervals with fewer changes.
- Perceived temporal length of a given task may shorten with greater motivation.
- Perceived temporal length of a given task may stretch when broken up or interrupted.
- Auditory stimuli may appear to last longer than visual stimuli.
- Time durations may appear longer with greater stimulus intensity (e.g., auditory loudness or pitch).
- Simultaneity judgments can be manipulated by repeated exposure to non-simultaneous stimuli.
Weights
Estimating weight is a more detailed way of asking “what will it take to lift / move [an object]”. There are a couple of useful facts that I use regularly for estimating weights.
Like-for-like
A typical, solid, house door weights around 45 kg (100 lb) – in my opinion and without measuring it or looking it up.
A castle door is 6 times as thick, four times as tall, and two-and-a-half times as wide (and there are two of them). There are three steel bands reinforcing it, one of which holds a heavy steel ring for people to grip while opening or closing the door. The doors are closed and barred by a beam that’s 1/10 of a door’s height, four times as thick, and five times as wide. The steel bands etc add a mid-sized motorcycle to the weight – call it 190 kg (420 lb). A character wants to guesstimate the weight of the doors as he wants to lift them off their hinges.
45 is an inconvenient number, I would use 50 and then trim 10% off at the end.
6 × 4 × 2.5 × 50 = 24 × 2.5 × 50 = 60 × 50 = 3000 kg. So each door would weigh about 2700kg (6000 lb).
Double because there are two of them = 6000 kg (12000 lb).
0.1 × 4 × 5 × 50 = 0.4 × 5 × 50 = 2 × 50 = 100 kg. less 10% = 90kg (200 lb) for the bar.
190 kg or 420 lb for the bars, locks, and what-have-you.
Total: 2700 + 90 + 190 = 2790 + 190 = 2980 kg (12000 + 200 +420 = 12620 lb).
In practice, I would use calculations like this to estimate it – but would round off to 3000 kg or 12500 lb.
Lifting those doors would take a King Kong. Unless you used a lever – one that wouldn’t break, like a steel beam with a wedge-shaped tongue and a solid slice of tree-trunk as a fulcrum. Doing that would cut the effective weight to 1/4 of the normal, or less – 750kg or 3125lb. The problem then becomes one of anchoring the character to the ground when he pushes down, because that’s well within the capabilities of a really strong human.
I don’t have to know the density of wood, or the exact measurements of the Doors – all I have to remember is ‘standard solid wood door = 45kg / 100 lb‘.
Water
Lots of things have a density around the same as water. People, for example. It’s probably not all that far off wood, to be honest. 1000 kg per cubic meter – or close enough to it. For those stuck in a non-metric system, 60 lb per cubic foot is about as inaccurate, underestimating the weight as much as the metric figure over-estimates it.
Liquid Gasses
Liquid Helium = 125 kg per cubic meter. And the tank.
Liquid Oxygen is slightly heavier than water – add 10%. And the tank.
Liquid Nitrogen is about 10% lighter than water. And the tank.
LPG gets up to a whole 1.882 kg per cubic meter, about 1/500th the weight of water for a given volume – at room temperature. Liquefied, it’s about 1/2 the weight of water by volume, plus about 5% to the result. And the tanks.
Those pesky tanks…. online sources, supposedly knowledgeable ones, list empty domestic LPG tanks as weighing 14.8 kg, or 11kg, or 14kg or 12kg or 10kg or 20kg.
Looking at the numbers more closely, though, the lower numbers are simply pressurized, while the higher ones appear to be also refrigerated. So, because it’s convenient, I would use estimated weights of 10kg for non-refrigerated tanks and 20kg for refrigerated tanks – the latter being the ones used for liquid helium, oxygen, and nitrogen.
That’s for the full sized ones that are about human-height in length. The little caravan-sized LPG bottles are 5-6kg in weight, and an empty scuba tank is 16kg. Compressed air and a valve will add about 3.5 kgs to the latter when it’s full.
Steel / Metal
There’s actually a range of 100 kg per cubic meter. I don’t care about that – the middle-of-the-range value of 8000 kg / cubic m will do me just fine. Multiply by 62 to get (approximate) lb per cubic foot.
Human ability to guesstimate weight
If we’ve got something to compare with, even without a scale, we can get to around 20% accuracy. But once we go far beyond a couple of kg – 4 or 5 lb – we are pretty appalling at estimating weights even if we have a known weight to compare with – at best, we’re talking the 50% accuracy. Estimating by eye actually tends to produce more accurate results.
How much more accurate? Well, there’s this study to contemplate: 17,205 People Guessed The Weight Of A Cow. Here’s How They Did.
In a nutshell – from a photograph, the average estimate was out by 5%. If you exclude the results to only those who had worked with cattle for a living, they were out by 6%. And the pattern of results is almost identical – right down to the cluster of underestimates around the 900-lb mark. (For the record, my guesstimate from the photo was about 1400 lb).
But, at the same time, a study of emergency personnel estimating the weights of patients (How accurate is weight estimation in the emergency department?) by the (US) National Institutes of Health found that they had only “moderate” accuracy – and that if a patients actual weight couldn’t be determined by measurement, dosages would be more accurate if based on the patient’s estimate of their weight, which tended to be “excellent”. Specifically, Patients: 3.9% error; 7.7% for Nurses, and 11% for Doctors. The percentages who got the results right within a 10% range were 91%, 78%, and 59%, respectively – the equivalent of 90%, 80%, and 60% confidence in the system described above.
So we’re better at estimating the weight of a cow than we are at estimating the weight of a person. Think about that for a while.
It’s also a known fact that manufacturers can trim 10-20% of the serving size out of a product by weight and a lot of people simply won’t notice unless there’s something to clearly call attention to the fact. If you introduce redesigned packaging at the same time and use that to imply some other cause for the reduction in gross weight of the product, even fewer will notice or care “New Eco-friendly packaging”.
We aren’t really very good at estimating weights.
Sizes – Lengths and areas
We have huge advantages when estimating small lengths and areas – the human body comes ready-built with all sorts of handy measurement scales (of varying reliability).
The second joint of adult male index fingers is about an inch. Hands are typically about 4 1/2 inches across and 7 inches long. Wrist-to-elbow is about a foot, and people are about 6′ tall. Strides are about a yard. Scale everything down for a female, of course. What’s more, we know fairly accurately whether or not we have longer fingers, longer hands, thinner hands, and so on, and so can adjust our personal scales without thinking about it.
Add a little experience or skill, and you can estimate the length of a two-by-four reasonably well – at least, until you try and cut it to size.
There are factors that can reduce accuracy considerably – if we have to turn our heads to see the far end of a span, or if it’s curved instead of straight, and so on. But our depth perceptions are a lot more accurate than we often think they are, up to a point – and that point is considerably broader than expected.
For distance and length estimates, 10% error is high. Divide the accuracy values by 4 unless you have some reason to reduce accuracy, in which case you should halve it.
We aren’t so successful at adding additional dimensions to get areas, and are even less successful at interpolating volumes. For areas, divide the accuracy by 2, and for volumes, use the base values.
Temperature
It’s a pet personal theory that might hold no more water than the top of a ball, but my personal impression is that 1°C (roughly 3°F) is about the smallest temperature change that can be felt by the human body strongly enough to cause a desire to modify our clothing choices.
Despite this, we aren’t very good at interpreting and measuring changes in temperature. More than about 2° of fever is “you’re burning up”. It’s as though we count “1, 2, many, ambulance”.
Environmentally, we can employ broad scales based around our comfort – cold, cool, neutral, warm, hot, too hot – but that’s about it. Subjective and relative scales play a bigger role in our thermal perceptions – ‘cold enough for a jacket’, ‘warm enough for a t-shirt’. Right now, my room feels “chilly” – which tells me nothing about the actual temperature, except that it’s slightly cooler than is comfortable for my current clothing choices. My thermometer informs me it’s 18°C (64°F).
There are all sorts of complications regarding acclimatization, too. I vividly remember wearing a short-sleeve shirt to work, many years ago, and coming out at lunchtime to discover “Huh- it’s snowing. Funny, I don’t feel cold.”
Age is also a factor – I know for a fact that I’m more sensitive to the cold now than I was twenty years ago. Not sure about hotter temperatures, though.
Within a temperature band, our perceptions can be fairly accurate – but the edges of the temperature band will be fuzzy, and the whole concept is relative and individual, anyway. Above about 43°C (109.4°F) Air temperature, temperatures are simply “hot”.
We are more sensitive to water temperatures; not too much hotter than that, we stop sensing temperature at all (and move directly to sensing pain) – and not far above that, even pain goes away.
The human pain threshold is around 106-108°F (41-42°C) for water temperatures; most adults will suffer third-degree burns if exposed to 150°F (65°C) water for two seconds, or 140°F (60°C) water for six seconds, or 130°F (54°C) water for thirty seconds.
A 32°C (90°F) day and a 38°C (100°F) day may feel similar – depending on the atmospheric humidity. But with water, 32°C is tepid – even slightly refreshing – and 38°C is notably warm, like a hot bath.
So we’re talking about a narrow span of temperatures within which we can make estimates, and those estimates are vague and perceptual. That’s why cars are such death traps on hot days, when the internal temperatures can climb 30° or even 40° higher than outside – a tolerably-hot 35° (95°F) outside can be a lethal 65°C (149°F) or 75°C (167°F) inside – and 75% of that increase occurs within 5 minutes of closing the car and exiting it. “I’ll just pop in [to the store] for some milk” can be a death sentence.
Elapsed Time
Within a span of a second or two, humans can be fairly accurate. if we use some sort of metronome system to count seconds, we can get to about 2 minutes with reasonable accuracy.
Human heartbeats are often cited in fiction as something that can be silently counted to estimate time. In reality, not so much – a normal resting heart rate can be 60 to 100 beats a minute but it can vary from minute to minute. Children often have higher heart rates than this. Any sort of stress or activity can send it skyrocketing to 190 or more beats a minute. The highest ever recorded is 480 beats a minute – comparable to the heart rate of a mouse.
Taking away any such ‘counting mechanism’ throws open the doors of subjective error. No, that’s too mild an expression – total inaccuracy comes closer.
External cues can help – I use albums (typically 42 minutes, or up to 74 minutes for a CD) to tell me when I need to take a break for eye health – I’m just about to do so, in fact! But these trade any reasonable accuracy for reliable inaccuracy.
Complicating everything is the fact that humans have several different timing mechanisms in parallel, each of which has a different level of susceptibility to various temporal illusions.
Throw in the cognitive variation – direct perception vs estimated temporal distance from the memory of events – and you have a total mess.
Temporal Illusions
Let’s start with quoting part of the summary of an article from the (US) National Library Of Medicine: Human time perception and its illusions by David M Eagleman
“Why does a clock sometimes appear stopped? Is it possible to perceive the world in slow motion during a car accident? Can action and effect be reversed? Time perception is surprisingly prone to measurable distortions and illusions.
“… Perceived duration can be distorted by saccades, by an oddball in a sequence, or by stimulus complexity or magnitude. Temporal order judgments of actions and sensations can be reversed by exposure to delayed motor consequences, and simultaneity judgments can be manipulated by repeated exposure to non-simultaneous stimuli.”
Saccades are “rapid, ballistic movements of the eyes that abruptly change the point of fixation. They range in amplitude from the small movements made while reading, for example, to the much larger movements made while gazing around a room” according to the (US) National Institutes Of Health. They aren’t just changing the direction in which you are looking, in other words, they involve changing what you are looking at..
I can’t do better from that beginning than a direct quotation of the relevant section of Wikipedia’s article on Time Perception:
Main types of temporal illusions
There’s also the Kappa effect, a form of perceptual time dilation – recurring stimuli, whether spacial, auditory, or tactile – either seem to occur at greater or shorter intervals than is actually the case. For example,
When mentally comparing these two sub-journeys, the part that covers more distance may appear to take longer than the part covering less distance, even though they take an equal amount of time.
…and more besides – there’s Flash-lag effect, the Oddball effect, and reversal of temporal order judgment. I’m not going to detail these, because I think it time to move on to my main point. Besides, I’m running out of time – exposing the accuracy (or lack thereof) of the time estimates with which I opened this article! It’s worth your time to read the whole page. Go ahead, I’ll wait.
…waiting…
…waiting…
…waiting…
…waiting…
…waiting…
… oh, back already? Okay, let’s continue!
Relationship to Optical Illusions
Optical Illusions occur for one of two main reasons: (1) Our brains are hardwired to take shortcuts that give ‘near enough’ answers and let us focus on what we are supposed to be doing, and the illusion exposes and exploits this fact; or (2) we received an evolutionary advantage of some sort and the illusion exploits and exposes an unintended consequence.
I’ve offered numerous examples of the first, notably in Blind Spots and False Illusions: How much can you really see?, but don’t think I’ve mentioned the second before.
Basically, if you can get everyone in a theater looking at one specific point on the screen, you can have something emerge from the vicinity of the “blind spot”, which causes an instinctive impression of a threat, causing people to jump.
Horror movies have been using this for ages, and it’s why some old conversions-to-TV were less successful just because the TV screen has different proportions to the movie screen – the advent of Widescreen has solved the problem.
It is unsurprising, therefore, that our perception of time – in the form of our perception of dynamic events – derive from exactly the same causes. Viewed in that context, it would be utterly astonishing if our perceptions of time were not subject to temporal illusions and distortions!
I’m short of time, so I’m just going to toss this out there for people to chew over.
1. While there are similarities, there is also the possibility that animals perceive time differently to humans. In particular, our color vision (which is better than that of most animals) may incur processing loads that make us more susceptible to temporal illusions.
2. There could well be species that take advantage of this effect, at least hypothetically. But that requires them to primarily hunt humans as preferred prey.
3. There is every likelihood that aliens and other non-human sentients would experience temporal illusions – but they might not be the same temporal illusions that we perceive.
Impact on Reliability of temporal guesstimates
Extremely short-term temporal estimates can be made by most people with reasonable accuracy through the use of mental timing tricks. You can estimate your personal reliability by two simple experiment:
1. Count to 100 in your head at 1 second intervals, timing how long it really takes (don’t look at the timer). At 100, stop the timer and see how accurate you are. Then repeat the test for counts of 10, 20, 30, and 60. Experiment with resets by taking a few seconds in between the tests or not (which will examine how the long count has tinkered with your sense of how long a second is).
2. Do the same thing, but this time silently mouth a word that takes about a second to say – the word I generally use is ‘elephant’. Compare with the results of the first experiment. Most people will observe a significantly greater reliability in test 2.
With skill, short-term temporal estimates can be made “reasonably accurately”. If you intend to do something in exactly a minute, without watching a clock, the odds are that you will start to do that thing within 40-90 seconds. But this depends on how much of the interval is spent doing something else and how much is simply waiting around – waiting makes it more likely that you’ll start early, without waiting the full minute, while doing things makes it easy to underestimate how long it’s been, causing you to start late. And observe that +50% is considered “reasonably accurately” in this context!
From about 3 minutes upwards, reliability becomes increasingly strained. Without visual cues or references or some sort of alarm, getting someone to do something “in an hour” could mean they do it in 40 minutes or in ninety minutes – the error margin scales!
“In a couple of hours” has an error of more than an hour. “In a week” has an error of more than a day. “In a month” – assuming that you don’t forget entirely – has an error of more than a week. (Conversely, “In 28 days” is much more accurate, because of the pattern imposed by this interval being divisible into weeks). And so on.
Here’s one more experiment to close out this section.
3. You’ll need a small group. Give someone the stopwatch, While the group watches, they start the watch and make some sort of visual display. At some point 10-120 seconds later, they stop the watch and write down the time they did so, while everyone else writes down their estimate of how long it was. Repeat (with different intervals) until you have 10-20 measurements for each participant. Then compare.
Time Required
If it’s hard to assess how long it’s been since something happened, it’s even harder to predict how long something will take to happen. Not only are all the temporal illusions still in effect, but you have to estimate the difficulties involved in completing the task and how long they will take to overcome.
I estimated this to be a typical-length article – about 4500 words or so. I passed that number a long time ago – it’s now 7140 words and counting, about 58% more than expected (so far).
That is a failure of the assumption, not the estimating process, but it’s an illustrative point, I think.
That said, the closer to an estimate you can make the process of guesstimation, the more accurate you will be. Even if it’s just breaking down the task into a number of roughly-equal sub-tasks will have a significant impact on accuracy.
I touched on that in pointing out that “28 days from now” is a lot more accurate than “a month from now”, because 28 days breaks down into four sub-tasks of equal length and with a recurring pattern; the base error margin is based upon that of the sub-task, not the task as a whole. You have to add a component for compounding errors, but that’s relatively small, and can be expected to mostly cancel out.
There are limits to this trick, though. “In three months” isn’t much worse than in “13 weeks”, due to the size of the “13” – in a nutshell, “four” is a number that we can directly comprehend, “13” is a number that we can only comprehend in the abstract. The “three” in “three months” doesn’t substitute for it, because “month” is inherently variable and fuzzy.
Three months ago was February 29th – except there isn’t one of those. It was also February 28th, February 27th, February 26th, March 1, March 2, and March 3. And, in fact, if something happened a week to either side of those dates, we’d probably still call it about ‘three months ago”.
Travel Time
Travel time is an interesting question to contemplate, given the problems already identified with time. It breaks down into two components: one linked to the speed, which provides the equivalent of the “counting elephants” throughout the trip if it’s consistent, and one relating to delays and interruptions – traffic, red lights, and so on.
A short trip
I live 2.4 km (1.5 miles) from the departure point of the 415 bus. While busses aren’t quite so predictable in speed – sometimes they have to stop and pick up / set down passengers, sometimes they don’t – that’s a relatively short distance. So you would expect the bus to be fairly reliable.
The trip going the other way is more than twenty times this distance. Busses are frequently 5 minutes early or 7 minutes late. At a reasonable frequency, those numbers can be 7 and 10 minutes respectively. On that basis, you would be forgiven for expecting the error at my bus stop to be 1/20th of 10 minutes, or about 30 seconds. Heck, you could be conservative and call it a minute either way.
That’s not what’s observed. While it’s rare for the bus to be more than about 3 minutes early, it’s not uncommon for it to be 3, 5, even 7 minutes late. Ten minutes after departure. That’s a 70% error rate.
Two factors account for this: the inherent variability, which can also impact on the accuracy of initial departure – call that two minutes of the total, and two critical traffic lights. The first one is just before my stop, and it accounts for another minute of the error. The remaining four minutes all stem from a single traffic light where the traffic is heavy and the window for transit is small – creating the potential for significant delays. Not every time, but often enough – more than one trip in three, at least.
And, as explained in Sequential Bus Theory and why it matters to GMs, once delays happen, they tend to snowball.
A longer trip
My dad lives around 550km from my home town; it’s a trip that he makes regularly. It takes about 6 1/2 hours. The biggest variable is how long and how often he stops for rest breaks – typically, two or three times, one of which is to eat. Call it twenty minutes and the other stops 5-10 minutes. so that’s 25-40 minutes in stops. Google says that his route should take 5 hrs 59 minutes – adding in the stops and you get 6h 24m to 6h 39, or an average of 6 hrs 31 min. His estimated error could be as much as 29 minutes from this, or 7.4%, but it’s more likely to be 5 minutes, give or take – a mere 1.28%.
Metronomic regularity, controlled by the speed limits, a predictable loss to traffic, and a minimum of traffic lights – that combination more than outweighs the variability of the number of stops. If he gives an ETA, departure delays are more significant than how long he has to stop along the way, and if he’s not within half an hour of the ETA, something has gone wrong along the way!
Travel Time estimates
Travel time estimates are exactly the opposite of most types of estimates – the longer the trip, the more reliable an estimate will be.
When estimating the travel times for the PCs exploring the towns and cities of Arkansas for a new Base Of Operations in my Superhero campaign, he made reasonable allowances for traffic and worst-case assumptions for other forms of delay. Most of the time, these failed to materialize – and as a result, a planned 10-hour day left the PCs a couple of hours per day ahead of schedule. And that’s with a couple of unpredictable delays added onto the schedule, accounting for another hour or more.
After a single day, the Red Cavalier was so far ahead of schedule that they were able to spend three or four hours exploring neighboring Mississippi in a side-trip – and were STILL ahead of schedule when they resumed the main exploration.
Making anything other than conservative estimates would have been irresponsible of the NPC doing the planning – but nine times in ten or more, those conservative estimates badly overestimated how long things would take.
Score check: Deadline started 21 minuets ago. Word count is now 8190.
I allow myself an hour before I consider delivery to be ‘late”, but I still have a few sections to write, and then have to spellcheck, edit, format, and illustrate this magnum opus – which will probably take 30-40 minutes, maybe longer..
Conclusion: I’m not quite going to make deadline unless I finish in the next 10 minutes. Delivery 30-40 minutes late is moire likely.
Vaguer Guesstimates
Of course, the discussion above are all concerning the more precise types of guesstimate. But there are a set of others that are likely to come up from time to time, and they need to get discussed, too.
Weather
There’s a great tendency to slice weather up into discrete daily events with no rhyme or reason behind them. I’ve taken exception to this from time to time and offered alternatives to simply rolling on ‘random weather tables’ that build memory of yesterday into the generation process.
See, for example, Ask The GMs: Weather, Not Climate – and the unfinished series on The Diversity Of Seasons, which at some point I will get back to!
Never does this need resound more solidly than when a PC asks what tomorrow’s weather looks like.
Tomorrow’s weather never starts from zero; it always starts from the conditions that applied today, and then gets modified by the changes that are going to take place in the course of the next 24 hours.
Such meteorology is a pain for the GM because it’s a lot of work that’s rarely required – hence the existence of those random tables in the first place!
Things are a lot easier in Fantasy games, when the state of the art was something along the lines of “Red sky in morning, Sailor take warning”. Sure, they knew the seasons, and roughly when they would start, and what the climate said the weather would be in each season – in fairly descriptive language, but that was about it.
From the invention of the thermometer, that starts to change. First, you get written records and precise numbers; and then you get interactions with barometric pressure. The telegraph brings the chance to observe the progressive shift in weather from place to place as changes transit, in something close to ‘real time’. And then weather balloons, and weather satellites, and better weather satellites…. and forecasts become practical, and just keep getting better and better.
Right now, three-day forecasts are 90-95% accurate, at least where I live. This drops over succeeding days until it’s only about 50-50 a week from now. Each day added is an exponential increase in the difficulty of accurate forecasts, so it’s going to take some sort of breakthrough to extend the forecast window much further.
But here’s an interesting fact: the weather service that I used back where I used to live is not accurate for where I am now, and not accurate for the next suburb out. That’s three different weather patterns in a distance of about 1.65 km – just 1.025 miles. For more on this, look at The Diversity Of Seasons Pt 1: Winter, and specifically, section 4, Winter In Sydney.
Concurrent Patterns
At one point, I had to use the train. Railway stations on my line are only a minute or two apart. I went from rainy to cloudy to sunny in the space of about 4 minutes.
It’s my theory that it was only when travel became fast enough that we could be in two places in a short enough interval of time to notice how different weather could be from one place to the next. Travel by car, and those places were 5 minutes or so apart – enough time for the heavens to open or close. Suddenly, it’s not so obvious. Travel by horse or by carriage, and we’re talking at least 10 and more likely 20 minutes – plenty of time for the weather to turn. And the diversity of weather pattern becomes as clear as mud.
Implications for the GM
Reliability of forecast means that if you keep it narrative, it will generally be as reliable as it can be expected to be – with room for the occasional unexpected divergence.
But that only matters if the PCs are staying put somewhere. As soon as they move, perhaps as little as 1/2 a kilometer (1/3 of a mile), all bets may be off – especially if the weather experienced is within spitting distance of the weather forecast..
Stock Markets
Something else with a memory is movements on a stock market. In fact, i once wrote a software stock market simulation program which factored random events both in contrast to the prevailing trend and as a direct effect on the market index. At the end of the ‘day’ it went into ‘overseas markets mode’ and did something similar there, but added a factor describing the relevance to the market being simulated. The last thing that it did before the markets ‘opened’ the next day was to compound all these effects and use them to determine (1) an initial market ‘adjustment’ and (2) revise the ‘prevailing trend’ to accommodate the last 24 hours.
Actually, the ‘prevailing market trend’ was actually three different trends – a short term trend (daily), a mid-term trend (ten-day cycle) and a long-term trend (sixty-day cycle). These then combined in a biorhythm-esque way to create the next short-term trend and update the other trends when the day incremented.
Did I say that weather forecasting was a lot of ultimately-meaningless work for the GM? Well, stock-market forecasts are even more work and even more pointless. So far as I’m concerned, daily stock market movements in an RPG are comprised of equal parts ‘the speed of plot” and 3d6 up, 3d6 down. At most, if yesterday was up, there will be a 50% chance that today’s will also be up if the general news in the campaign is good and a 50% that it will be down if the news is bad.
If your PC want to predict what I will roll on 6d6, even with a bias from yesterday and the events scheduled for the day, go right ahead.
Manpower
How many people do you need to get X finished in Y time? This takes the task completion of the earlier section and compounds its variability with still more imponderables and unpredictables like relationships and leadership and industrial action and politics.
IF all things were to remain equal, it wouldn’t be much more difficult to extend that ‘time required’ guesstimate to derive a manpower figure required to reduce the guesstimate by X%.
The longer the resulting ‘project time required’ is, and the larger the workforce that is required, the more certain it becomes that all things will NOT remain equal.
On top of that, there are practical limits to how much task subdivision there can be. Throwing 10,000 people at a project that should take ten man-days does NOT mean that it will be complete in anything like 1.44 minutes. No way, no how. Not even 15 minutes.
So you assume that things will go wrong 50% of the time, and expand the workforce to accommodate that, and put a hard limit to how much time can be saved.
Obviously, the nature of the project is all-important; the more independent parts it can be broken into, coordinated, and supervised, the more simultaneous tracks can be accommodated. Each track is then subject to its own Manpower assessment, with the net effect that quite large projects can be completed in reasonable time. This applies to everything from building a skyscraper in a year or three to the Apollo program.
Don’t forget those administrative functions and overheads in your guesstimating, either!
Costs & Budgets
Hand in hand with manpower estimation comes the last of these specific categories.
There’s a simple rule of thumb that I use from my days in IT: “You can have it good, you can have it fast, or you can have it cheap. Pick one.”
The traditional form is “Pick two”, but if there is sufficient obsession with one of the three, the second is also necessarily sacrificed.
So, normally, you might be able to say “I want it good and I want it cheap.” Okay, that makes time the sacrificial lamb; you need to hire university students and promising grade-schoolers, buy their attention with cheap trinkets and promises of street cred in the IT world, and let them work on the code for as many years as it takes.
But if you want it really good, those won’t be enough; ‘cheap’ has to get tossed overboard, and instead you are Google, hiring the best and brightest for whatever it takes and paying them for as long as it takes.
There are, quite frankly, so many variables in this sort of estimate that another rule of thumb comes to mind: Estimate a best-case cost and multiply it by ten. Unless you want it good, or fast, or cheap, in which chase multiply it by 20, instead.
Shortcuts in engineering and software projects never seem to go where they are supposed to, in the long run.
Guesstimates for the GM
You will need guesstimates, and an understanding of the limitations inherent in them, to answer player questions and requests.
You can either do a lot of work basing these on reasonable and realistic estimates, or you can cheat and base them on guesstimates of your own, which you then modify as events arise that help or hinder.
Sounds like a no-brainer to me.
Okay, my deadline came and went two hours ago. I’m up to 9840 words, and still have that extra work to do, so I’m estimating publishing at
3:20 AM, local time – more than 2 hours late.
Update 2, 3:34 AM: Illustration done, spellchecking done. Formatting and final editing underway. Publication estimate is revised to about 4:15AM, about 3 1/4 hours past my (self-imposed) deadline.
Update 3, 5:22 AM: Formatting was a nightmare; for some reason, even though it was automatically generated by the CSM, the link to the Creative Commons License wasn’t resolving, and it was taking the rest of the caption with it – including the end-of-caption instruction. And all text until the next hyperlink. That, and everything that followed, was present – as part of the caption. But it’s all done and ready to post, 4 1/2 hours past deadline.
But I think it’s been worth it.
Discover more from Campaign Mastery
Subscribe to get the latest posts sent to your email.
Comments Off on Guesstimates in RPGs: Measuring Handwavia