"Make Your Stealth Roll".
Photo from / Pete Smith

In many RPGs, skill results are a light switch – you either succeed or fail. At best, this is a missed opportunity for the GM; at worst, it can convey a false sense of capability to PCs because they have no idea of how close they may have come to failure, just that whatever they rolled was sufficient.

The worst case can be avoided to some extent by simply telling players what they need to succeed in a skill check, but this does nothing about the best-case failure of the system. Perhaps ‘failure’ is too strong a word, but it will suffice.

It doesn’t have to be that way. There is a very simple solution and one with vast benefits to the campaign. And that’s what today’s article is all about.

Scale Of Success (or failure)

The principle is simple: translate an approximate margin of success or failure into narrative.

In order to actually perform that, you need to understand the probability of success or failure fairly intimately. That means different things depending on the system mechanics. I’ll look at three of the most common approaches:

  • Linear eg d20
  • Bell-curve eg 3d6
  • XdY, count those above a threshold
    Linear eg d20

    d20-based systems make this fairly simple – you simply set a ‘bandwidth’ for each classification of success or failure. There are three fairly common patterns:

    • Fives
    • Threes or Fours
    • Non-linear series

      Probably the most common approach is to use intervals of 5, also known as the 5-scale – so that

      • failure by 5 or less is a ‘near miss’, success by 5 or less is a ‘difficult success’;
      • failure by 6-10 is a ‘worse failure’ (relative to the 1-5 band), success by 6-10 is an ‘easier success’;
      • failure by 11-15 is a ‘serious failure’, success by 11-15 is an ‘easy success’;
      • failure by 16-20 is a ‘monumental failure’, success by 16-20 ‘makes it look trivially easy’.

      In addition, many systems incorporate the concepts of a critical success or critical failure / fumble, occurring on a natural extreme result on the die roll. Some GMs may choose to regard these as an additional category, others will simply default to the most extreme category listed above. That’s entirely up to the GM, though he should be consistent or the narrative loses its value as a tool for roleplay.

      Threes or Fours

      Some GMs choose to narrow all but one of these bands, expanding the remaining one to fill the gap. This is usually applied to widen the range of ‘monumental failure’ and ‘trivially easy’ bands.

      Fours, or 4-scale:

      • ‘monumental failure’ = fail by 13 or more.
      • ‘serious failure’ = fail by 9-12.
      • ‘worse failure’ = fail by 5-8.
      • ‘near miss’ = fail by 4 or less.
      • ‘difficult success’ = succeed by 4 or less.
      • ‘easier success’ = succeed by 5-8.
      • ‘easy success’ = succeed by 9-12.
      • ‘trivially easy success’ = succeed by 13 or more.

      Threes, or 3-scale:

      • ‘monumental failure’ = fail by 10 or more.
      • ‘serious failure’ = fail by 7-9.
      • ‘worse failure’ = fail by 4-6.
      • ‘near miss’ = fail by 3 or less.
      • ‘difficult success’ = succeed by 3 or less.
      • ‘easier success’ = succeed by 4-6.
      • ‘easy success’ = succeed by 7-9.
      • ‘trivially easy success’ = succeed by 10 or more.

      To appreciate the reasons why a GM might choose one of these, consider the actual likelihood of each result at three different skill targets: low (needs 6 or better), moderate (needs 11 or better), and high (needs 16 or better):

      • Low (6+): A roll of 1 would be a failure by 5, or possibly a critical failure. Ignoring the latter possibility, that’s a ‘near miss’ on the five-scale, and a ‘worse failure’ on both four and three scales. If critical failures are part of the game system, then a roll of 2 is the worst non-critical failure, which is a failure by 4. That’s still a ‘near miss’ on the 5 scale, becomes a ‘near miss’ on the four-scale, and remains a ‘worse failure’ on the three scale. Going directly from a ‘near miss’ to a ‘critical failure’ bothers some GMs; they would rather have some sort of intermediate failure level in between the two. But there isn’t a lot of room for that on a fairly easy roll, i.e. when the character is highly skilled; most of the room is taken up with the (greater) likelihood of success.
      • Moderate (11+): An average target means that you can succeed or fail by as much as 10, or 9 if criticals are reserved. Failure by 9 is the second-closest level of failure on the 5-scale, while success by 9 is only the second-best success mode on the 5-scale. Despite looking good in theory, when actually applied, the 5-scale is often considered too blunt. Failure by 9 just scrapes into the ‘serious failure’ category on a four scale, while success by 9 is an ‘easy success’ – in other words almost the entire range of results are possible. If the Critical Failure/Success descriptions default to the narratives prepared for the most extreme categories, in fact, the entire range are possible – the two most extreme categories in each direction have only 5% chance each of occurrence, but that’s better than none. But the 3-scale is even better in some GM’s eyes, making the third most-extreme outcome in each direction as probable as the less extreme results.
      • High (16+): If you transpose the words ‘success’ and ‘failure’ in the low-target description, the results are identical to those from this target result. As the difficulty relative to the skill level of the character attempting to use their skill, the ‘success modes’ get cramped for room, while the room available for failure modes expands.

      In theory, the best results would be achieved by having different ranges apply depending on what the character needed to roll, but that’s too much hard work to be practical. Some GMs divide their handling up by character level, in the expectation that skill levels will reflect character levels – so they might use the 4-scale through to tenth level and the 3-scale through from eleventh level up. But that can get messy when you have some characters who have gone up into the higher level range and some who have not; better to have one system and stick with it throughout.

      Non-linear series

      One method that comes to mind for avoiding many of the problems listed above is to make the different outcomes have different likelihoods of success in the first place. Whoever said that the probabilities had to be evenly distributed, anyway?

      There are two obvious approaches to applying this principle: success or failure by 1 for the narrowest, then increase by 2 for each subsequent category; or success or failure by 2 for the narrowest, increasing by 1 for each subsequent category. To distinguish these from the ‘linear scale’ models described above, I tend to call these the ‘2-mode’ and ‘1-mode’ respectively (referring to the way the categories increase in size, and not the size of the narrowest category of result).


      • ‘monumental failure’ = fail by 9 or more.
      • ‘serious failure’ = fail by 4-8.
      • ‘worse failure’ = fail by 2-3.
      • ‘near miss’ = fail by 1.
      • ‘difficult success’ = succeed by 1.
      • ‘easier success’ = succeed by 2-3.
      • ‘easy success’ = succeed by 4-8.
      • ‘trivially easy success’ = succeed by 9 or more.


      • ‘monumental failure’ = fail by 10 or more.
      • ‘serious failure’ = fail by 6-9.
      • ‘worse failure’ = fail by 3-5.
      • ‘near miss’ = fail by 2 or less.
      • ‘difficult success’ = succeed by 1 or 2.
      • ‘easier success’ = succeed by 3-5.
      • ‘easy success’ = succeed by 6-9.
      • ‘trivially easy success’ = succeed by 10 or more.

      Once again, to look at the advantages, you need to examine the possible outcomes based on what a character needs in order to succeed.

      At very low chances of success, there’s still a full range of failure modes available, and the room for success still cramps up – but the scales have also been ‘cramped’. If ‘monumental failure’ is the equivalent of a critical failure, then a ‘2’ roll yielding the second-worst possible result would happen on a target rolls of 6-10, with the chance of that outcome on that roll going up by 5% with each +1 to the target roll required for success. Similarly, at very high chances of success, the success modes get cramped, but the shrinkage in the likelihood of the ‘close’ results makes room for the full gamut of possible outcomes for most results. It’s a little more work until you get the ranges memorized, but this is the scale that I use for my d20 games.

      Nevertheless, it is more work than the straightforward 5-, 4-, and 3-scale choices, and with a linear roll, you have the choice.

    Bell-curve eg 3d6

    Bell-curves complicate everything. The likelihood of missing by 1 depends on what you need to roll. If you need 6 or better, it’s 5/216, or about 2.3%; if you need 11 or better, it’s 12.5%; if you need 16 or better, it’s back down to about 4.6%. Even experienced GMs can have difficulty visualizing the way the probability curve impacts the chances of success of a given result. My Co-GM and I have to do this regularly to determine how big a modifier we need to apply to create a given psychological expectation of a result in the Adventurer’s Club campaign – but it’s worth it; a minus-4 modifier sounds huge (and it is), but if you can use it to ensure that only one or two PCs succeed in a difficult task, you encourage a variety of experiences at the game table. Depending on the character’s skill levels, there can be times when -2 is a bigger penalty than a -4, or even a -6! – but it sounds so much smaller than -4 or -6 that there is a greater expectation of success. This can be manipulated to change the interaction between character and adventure, ensuring that each gets his moment in the spotlight each time we play, that one or two characters get to star in an adventure, and so on – so that no one character dominates play all the time.

    Graph of X or less on 3d6

    Above is a graphing (courtesy of AnyDice of the chances of rolling less than (x) on an unmodified 3d6. To get a handle on how a -N modifier (not beneficial) would affect the likelihood of success, simply find your target number and count up N bars. +N modifier (i.e. beneficial) is simply a matter of counting down. I’ve chosen this graphic because – unlike most d20 systems – the rolls of 3d6-based systems are usually ‘x or less’.

    With a bell curve, you have exactly the same options as were outlined for the d20 example earlier, but the effects are disproportionately amplified for extreme die roll targets, and the ranges narrower to begin with (on 3d6 and 4d6 rolls, anyway).

    This means that the 5-, 4-, and 3-scale options don’t – ever – yield an even chance of achieving each category. There is a disproportionate increase in the likelihood of getting whatever result lies nearest the natural average roll, and a disproportionate decrease the farther away from these that you get.

    Ironically, the Mode-2 and Mode-1 patterns actually compensate for these effects in some measure – how much is far too complicated a question to go into here – resulting in something closer to an even distribution of result likelihoods. With one of the two biggest advantages to the -scale options left inapplicable, it only strengthens the arguments in favor of one of the two ‘mode’ alternatives.

    XdY, count those above a threshold

    There are an increasing number of systems that work in this way, or so it seems to me. That’s because they embody a more sophisticated probability mechanism that does most of its work ‘below the surface’ where neither GM nor players can see it, yielding a very simple game mechanic. The key is that it provides the GM with two variables to play with: the target threshold for a die to count, and the number of successes (dice ‘counted’) required to achieve overall success in a task. On top of that, there are variables in how skill levels are manifested (more dice or a bonus to each?) and how stats apply to skill checks (more dice or a bonus to each?). Ultimately, what you end up with is nevertheless a bell curve, but with a much smaller range of results, and one that is skewed in the opposite direction to the threshold value – if the threshold is low, the likelihood of a higher number of successes increases, and vice-versa.

    I’m trying hard not to get sidetracked into looking at this in detail, so even though I’ve worked out how to do it at AnyDice, I’m not going to get into probability graphs. (If you want to play around with it for yourself, here’s a link to the code for 10d10 and a threshold of 3:

    Just click the link and take a look. Then change the 3’s to 5’s and hit calculate again. Try changing the number of dice. You’ll soon get a feeling for the way this probability mechanism works).

    The bottom line here is that we need a smaller mode-style progression in order to fit within the available range of results. It’s like rolling dN where N is the number of dice – but where the chances are distorted into a bell curve.

    With eight dice, the range of possible results runs from 0 (no rolls above the threshold) to 8 (all above the threshold), and the average result will be roughly 1/2 [(min + max) + (die size-threshold)-1]. With a threshold of 3, and a die size of d10, that gives [(0+8)+(10-3)-1]/2 or 14/2 = 7. But this won’t be exact. If the number of dice is smaller, the average shrinks. I recommend what I’m going to label Mode-Zero:

    • ‘monumental failure’ = fail by 4 or more.
    • ‘serious failure’ = fail by 3.
    • ‘worse failure’ = fail by 2.
    • ‘near miss’ = fail by 1 or less.
    • ‘difficult success’ = succeed by 1.
    • ‘easier success’ = succeed by 2.
    • ‘easy success’ = succeed by 3.
    • ‘trivially easy success’ = succeed by 4 or more.

    If the number of dice being rolled is usually more than 10, you might increase the ‘serious failure’ and ‘easy success’ categories to a band of two results (fail or succeed by 3 or 4), shifting the extremes by 1 in the process; I wouldn’t contemplate it for less.

    Success by 0!?

    You may have noticed that none of the above proposals do anything special for an exact success. Some include it in the ‘difficult success’, others don’t mention it at all. There are two options for handling ‘success by zero’ – you can either consider these a “difficult success’, or you can let this be GM’s Choice – so long as you end in a success. So you might start out describing a “monumental failure” only to have some twist of fate yield a success at the last possible moment. Or a “Trivially easy success” that almost goes drastically wrong at the end.

    Frankly, this choice should be dictated by your improv abilities – if they are good, go with the GM’s choice, because it’s more dramatic. If you aren’t confident, go with the ‘safe’ choice.

Differentiated Narratives For Scales Of Success

For each of these different degrees of success or failure, the next thing needed is a piece of narrative. Then, instead of telling the player what they need, you can relay this narrative after they roll.

This is a heck of a lot better than a “You succeed” or “you fail”, or their equivalents when applied to a particular skill.

But preparing such a list in advance for every skill is a lot of effort. Especially since the ideal would be to not reuse them for a while, afterwards.

It is possible to construct a general list that you then interpret for whatever the skill is to which the narrative is being applied. This takes an impossible task and re-frames it into a practicable solution.

You can also subdivide this general list however you see fit – you might break the total number of skills into “awareness” skills, “analysis” skills, “knowledge” skills, and “action” skills, for example. This would require four lists, but would make the “interpretation” much easier.

How to generate the Differentiated Narratives

Either way, the process that I have devised for generating such lists is the heart of today’s article. I have given the contents of these lists the general title of “Differentiated Narratives” – Narratives that Differentiate between degrees of success or failure.

The process is simple, mostly consisting of short steps that are repeated as often as necessary:

  1. Pick a skill
  2. Describe each degree of failure
  3. Describe each degree of success
  4. Choose a different skill
  5. Translate each description
  6. Generalize each description
  7. Repeat 1-6 at least twice more to generate new descriptions.
    1. Pick a skill

    Start by picking a skill and a typical task that a PC might want to accomplish using that skill. This HAS to be something that you would normally require the player to roll for; no tasks that you would normally hand-wave.

    Let’s Pick “Climb” as an example, and “climb a short cliff” as the task.

    2. Describe each degree of failure

    You may have more degrees of success than I have indicated, or you may have less; but I think that four failure and four success modes are about right. For each of them resulting from the chosen skill being applied to the task, create a line of narrative. I always find it easier to think about the ways a task might fail, first.

    • Catastrophic Failure: You almost reach the top before a handhold crumbles and you fall. Everyone who follows (including any second attempt by you) are at a penalty to succeed, and there’s a 1 in six chance each that you will knock someone else off the cliff on your way down.
    • Serious Failure: You reach about half-way up before misjudging a hand-hold and fall. There’s a 1 in six chance each that you will knock one of your companions off the cliff on your way down.
    • Failure: You almost reach the top when a handhold crumbles. You fall a short distance before you catch yourself, but you wrench your shoulder badly in the process. You need a Cure Light Wounds or a Healing potion, but that will have to wait until you reach the top; in the meantime, you are unable to climb.
    • Almost Succeed: You reach about half-way up before the handhold you are reaching for crumbles. You find yourself stuck, unable to climb any higher. Other climbers will have to use another route to the top, and may then lower a rope to you.
    3. Describe each degree of success

    Having worked out how to fail at the task, it becomes easier to work out how someone can succeed despite almost succumbing to those difficulties.

    • Just Succeed: It was touch-and-go when a ledge collapsed under your weight, but you caught yourself on an outcropping and were able to eventually reach the top, completely out of breath.
    • Succeed with Difficulty: Crumbling handholds made the climb difficult, and matters weren’t helped when your sword fell from it’s scabbard about half-way up, requiring you to go back down and retrieve it.
    • Succeed Easily: Some of the handholds were hard to reach and none of them were as secure as you would like, but with great care, you climb the cliff.
    • Make it look easy:: Your arms were exactly the right length to reach from one hand-hold to the next, and although you loosened several of the ledges you used on the way up with your weight, you make the climb look easy, and aren’t even winded when you reach the top. You wouldn’t expect it to be so easy next time.
    4. Choose a different skill

    If you are creating just one general list, this should be a radically-different skill and task. If you have sliced the overall pool of skills applications up into subtypes, which I recommend doing at least for a while (it makes the rest of the process easier), then you should choose a skill from the same category – so long as it’s a different one to your first choice.

    To make the example as comprehensive as possible, I’m going to take the harder path, and choose “Knowledge: History” as the skill, with the task being to trace the movements of a particular Elf through multiple documents in an attempt to figure out where he hid a treasure that he stole from a Drow Enclave, using a library of rare books.

    5. Translate each description

    Taking each of them in turn, determine what the equivalent of each degree of failure or success would translate to in order to apply to the new skill-and-task pairing. Note that if you do this sort of translation in your head, and then immediately perform the next step in the process for that degree of success or failure, it will be less work.

    • Catastrophic Failure: You almost finish, but when you hold a scroll up to the lantern to see it more clearly, it catches fire. You beat out the flames, doing some minor damage to the scroll, but in your haste, you knock over the ink-pot, spilling ink all over your notes and two of the source documents. You will not only have to start all over, you have made it materially harder for you – or anyone else – to succeed.

    Now, you could proceed with the rest of the degrees of success or failure, but instead of writing the above down, let’s pretend that I’ve done it in my head and proceed directly to step 6 – then come back to this step to work on the next one.

    6. Generalize each description

    The act of ‘translating’ each failure/success narrative into application to a different skill means that you have started, mentally, to translate it into a generalized form – which can then be applied to any skill as needed.

    • Catastrophic Failure: You almost succeed before an accident not only causes this attempt to fail but makes it harder for you or anyone else to succeed in a subsequent attempt at performing the task.

    Now, since you’ve only done the first one, go back to the previous step and process the next degree of failure/success.

    7. Repeat 1-6 twice more to generate new descriptions

    If every catastrophic failure were to have the same general ‘shape’, they will get boring fairly quickly, and the same is true for all the other degrees of success/failure. The more variety you have, the longer it will be before you have to repeat one. By the time you include interference, misjudgments, accidents, circumstances, wildlife, environment, and potentially many more, it’s not too difficult to generate a lengthy list of possibilities. But why try and do them all at once? Do just enough that you’re covered, then store them when those are all used up and generate some more.

    Based on my experience, you are unlikely to need more than three from any given degree of failure/success in any given day’s play, but you may very well need two – so, two, plus a reserve of 1, is my recommendation for a standard ‘batch’.

Applying Differentiated Narratives

Applying one is fairly simple – identify the degree of success or failure, pick the next generalized description that matches off your list, and apply that general narrative to the task and skill being employed, just as I translated the “climbing” narrative into a research narrative.

Replacing and storing used Narratives

As each is used, I would tick it off in pencil. When I got down to only having the reserve unused, or less, I would generate a new batch to add to the list. If I couldn’t think of new ones, I would erase the ticks and start over – but keep trying to extend the list in between game sessions.

Narrative by Instinct

Eventually, you will discover that you don’t need the lists anymore; the combination of circumstances, task, skill, and degree of success or failure will prompt you to think of a solution on the spot. This is Differential Narrative by Instinct. Once reaching that point, habit and repetition become your enemies – and the easiest way to combat them is to add whatever you have come up with to the list (in suitably generic form). This will enable you to continually verify that you aren’t inadvertently repeating yourself.

What’s more, you will find that the exercise of creating and using the lists has greatly enhanced your improv strength in other game areas as well – a side benefit, but a good one.

And now, having seen what is possible, you can fully see what I meant when I described the ‘succeed or fail’ light-switch as a wasted opportunity.

