The Dark Art of Story Points

Brian Shef
12 min readAug 9, 2019

Understand the oft-misunderstood, and gain power in the process

It’s much less complicated than… whatever is happening here.

If your team is operating within some flavor of Scrum, it is likely wrestling with the concept of story points. The Scrum Guide instructs teams to use some quantifiable means of estimating the complexity, difficulty, or effort involved in completing a story — and so, teams settle on some arbitrary point scale.

At face value, the concept appears to be fraught with problems. It’s generally agreed-upon that complexity increases according to the Fibonacci Sequence, and so that is the most common scale for story points; while the reasoning makes sense, the outcome is a bizarre scale with discrete values at 1, 2, 3, 5, 8, 13, 21, and so on.

It’s generally agreed-upon that point values will differ from person to person (a 3 for me might not be a 3 for someone else on the team), so the most common solution is for teams to estimate stories together in grooming or planning. Again, the reasoning makes sense, but the outcome is often very bizarre: Scrum Poker, and lots of time spent advocating for why a story is 3 points instead of 5.

It’s generally agreed-upon that story points are not a good metric of performance or success, so managers and scrum masters always introduce burndown and velocity charts with a litany of robotic caveats. The reasoning makes perfect sense, but, teams are often confused about why the team spends any time producing and looking at such charts to begin with.

Story Points, in fact, can appear rather arcane to the typical product development team. Their craft, after all, is not in producing estimates of their work, but in producing actual products. Engineers would rather code, designers would rather work out a new UI. But this is an artifact of how various sects of Scrum tend to play out in reality: Work work work, introspection, work work work, and eventually, after all that work, at one point in time, delivery. Then back to work work work, etc.

It becomes a values problem. Why would the team, who spends 99% of their time in a work phase, and 1% of their time actually delivering a finished product to stakeholders, place any value in any kind of process (such as delivery), as opposed to valuing the actual work? Let the Scrum Masters and Project Managers — disciples of the Process Perspective — worry about that, right?

Wrong.

The Dark Art

And this is the Dark Art of Story Points — when understood, they can be leveraged; when leveraged, they can maximize a team’s effectiveness. This means that, with some minor sorcery, Engineers get to spend more time coding, designers get to spend more time polishing the UI, and in general the whole team gets to spend more time working than worrying about the process. To illustrate, let’s look at two hypothetical teams: One will be a typical product development team, which has been going through the motions of assigning story points but never really caring about or understanding them; the other will be a team practicing the Dark Art.

The Team That Careth Not

During grooming, the product owner advocates for the priority of issues in the backlog, while the rest of the team hashes out details, and someone somewhere blandly calls out a point value for the story, which the Jira Jockey dutifully records on the issue. Some stories are rather large, and are estimated at 13 or 21 points; some are rather small, and are thus estimated to be 1 or 2 points only.

During planning, the team selects the stories they wish to pull into the sprint, watching their story point value steadily creep up. Eventually, the Scrum Master notices the sprint has racked up more than 80 points, and she speaks up. “I think we’ve overextended ourselves for this sprint. Is there anything we can leave out of this sprint?” The team isn’t so sure whether or not 80 points is too much. They ask the Scrum Master to review their velocity history, but it’s inconclusive — their velocity fluctuates wildly between 50 and 90 points, so 80 feels fine. They know the pressure they’re under to deliver in a timely manner, and so they might even proffer excuses to avoid pulling anything out of the sprint: Oh, some of the sprint’s stories are partially-completed carryover from the previous sprint. Or, well, we don’t have a quarterly meeting during this sprint, so that will give us a few more hours compared to last sprint.

During the sprint, of course, issues pop up. One story is deemed to no longer be a priority, and returned to the backlog. A major bug is found and takes precedence. It is so urgent that engineers spend a few days fixing it, and slap a story into Jira after the fact, opting to simply not estimate it.

Finally, in the retro, a business stakeholder drops in with a very normal, very common question: “When do you think you’ll be ready to deliver?” The team again looks at their velocity history. They look at their burndown chart. But they know their velocity is too volatile to be a good predictor of anything. And they know their stories are too varied to treat them with any sort of equivalence — they couldn’t say, for instance, that they know they complete 8 stories on average per sprint. And so they default to their gut instincts — which may be very good instincts — and come up with a number. And now the business stakeholder knows that the team has given him a number based on a gut instinct rather than data, and so she decides to sugarcoat the number a little bit based on her own gut instincts about how accurate the team’s predictions have been in the past when she reports the answer to her boss.

And somewhere at the Keurig after the retro, the senior engineer groans to the Scrum Master, “No offense, but why do we even bother with these numbers? We never use them. I could do more work if we stopped talking about points.”

The Team That Wieldeth Story Points

During grooming, the product owner advocates for the priority of issues in the backlog, while the rest of the team hashes out details. Early on, the team had already agreed upon their story point scale, from 1 to 8, with general guidelines on what a 1 meant, what a 5 meant, and what an 8 meant. Their Definition of Ready ensured that every story had sufficient detail for anyone on the team to work it, and thus anyone on the team could call out a story point value. The set of possible story point values was intentionally small, and so differences in opinion also didn’t matter much. So the first person felt the story was a 3, and someone else felt it was a 5. There was no point in arguing, because these were all estimates, and all going to be in the same rough ballpark. In fact, the team noted in their working agreement that they would strive to break down stories into at least the 3 or 5 story point range; size 8 stories were to be the exception rather than the rule. Consistency is power in the Dark Arts.

During planning, the team selects the stories to pull into the sprint. However — and this is paramount — they begin with the most critical objectives first. They do so with the intimate understanding that they are committing to delivering against these stories by the end of the sprint. Once the critical items are taken care of, however, they are free to pull additional stories from the backlog and work them. Based on this practice, the Scrum Master would see that the team has only racked up 30 story points for the upcoming sprint. Someone asks, “This seems a bit low. Shouldn’t we add more?” And the Scrum Master will take a peek at the velocity history and see that, indeed, the team averages between 55 and 65 points per sprint. It’s easy enough, then, for the team to select 5 or 6 more stories to pull into the sprint from the prioritized backlog — it doesn’t have to be exactly in the 55 to 65 point ballpark, just close enough, because the team always has the option of pulling additional stories during the sprint.

Or maybe the Scrum Master would notice that the team has committed to 50 critical points of items right off the bat. That doesn’t leave much room for pulling from the backlog — and in fact, if anyone gets sick, or a meeting runs late, or an emergency comes up, this will put the commitments at risk. The team knows the Dark Arts already takes into account the Unexpected as given, so there are no excuses to be made. The team simply must decide which issues to leave in the backlog this sprint. It’s a decision that can be made quickly, because the team understands that practitioners of the Dark Arts can only leverage so many points successfully, anyway.

And of course issues do pop up during the sprint. They always do. And the Scrum Master can even say with good certainty, based on the metrics, how much unplanned work the team can expect, on average. So the team is prepared for this.

And finally, when the business stakeholder asks the team, “When are you expecting to be ready to deliver?” The team can look at their burndown, and see that they have 150 point values left in the project — at a rate of 50 points per sprint, it would only take 3 sprints to complete. The project’s stories are largely within the 3–5 story point range, and so they can calculate burndown based on number of stories with good certainty, and see that with 35 stories left, at 10 stories per sprint, it would take them a little over 3 sprints to complete. With consistently sized stories and little overcommitment, the team can perform solid Monte Carlo calculations and arrive at the fact that there is a 95% chance they will have the project completed in 3 sprints, and a 99.9% chance they will have the project completed in 4. The Story Points were used to summon cold, hard numbers for the business stakeholder, which can then be sent directly to her boss without any sugarcoating necessary. High fidelity visibility is maintained at all levels.

The Worst Case Scenario

What happens when a team is given a deadline? Deadlines and flagpole dates aren’t agile at all, I know. We all know. But they are a reality. It’s August, and a team is working against an October 1st deadline. Their CEO is scheduled to be the keynote speaker at MegaAwesomeCon in San Fransisco and announce a new flagship product for general availability. Investors are expecting the buzz to lead to over 500k users in the first day. If the CEO instead demos the product and explains that users can sign up “soon,” the hype will be wasted, and competitors will have a chance to make announcements and scoop up users. In other words, the stakes are ultra high; welcome to the worst case scenario.

If you were that CEO, you’d probably be asking once a week, “How’s that project coming along?” And you might be told every week, “Just fine.” But what if that answer is the result of sugarcoating the estimation around the wild ass guess based on the gut instincts of a team under immense pressure? Is that really good information?

It’s hard to be told that a team won’t make the deadline. But this is only August. What if the team could tell you TODAY, with 98% certainty, that although they won’t be ready come October 1st, they WILL be ready on October 17th? That’s time to craft your presentation properly, and line up marketing to fill the 17-day gap, and set up an early access page or a limited beta invite for con attendees. Whatever the salves are to the problems created by missing the deadline, since the team has told you with statistical certainty when they WILL be ready, they have given you a precious, precious gift: Time.

Additionally, if you were the team, and you knew with 98% certainty that at your present rate, you’d be 17 days late, isn’t that something you’d want to know as early as possible? Absolutely! Because now you have precious time, too. Scope, of course, is the biggest lever to pull in order to adjust the product delivery timescale. Maybe an hour of that time is will spent in a room with testers and UX folks and product owners making some hard calls about which feature to pull out of the initial release and save for a fast-follow update.

And maybe there is time now for other teams to lend some engineers to the project. Maybe there is time to schedule some early mornings or late nights or weekends. Again — whatever the right calls are, you have the time to make them. There is no sudden panic and dread at 5pm on September 30th.

How Is This Possible?

Let’s all remember that Story Points alone don’t unlock some tremendous power. And in fact, many of the original creators of the Agile Manifesto argue against story points at all. Their point is that Story Points are often so misunderstood or so abused as to make their implementation detrimental to teams. If you’re going to rely on time-based metrics, or even wild-ass guesses, it’s better to dispense with the story point concepts and rituals altogether and at least recuperate the time that would otherwise be spent in estimation.

  • Story Points are not performance. Do not use them as such, or they will consume you.
  • Story Points are estimations only. Everything a team does during a sprint is a product of time and effort, but we are all humans in a highly dynamic world with an endless supply of unknown variables. To expect precision out of Story Points is to invite ruin.
  • Story Points are only powerful if you use a small subset of them. It takes a truly experienced and powerful warlock indeed to channel stories with 13, 21, or more points. Besides, good story writing — a topic for another time — involves breaking down stories into the smallest possible deliverables.
  • Story Points are but one tool. Your toolbox at home contains more than a set of hammers, no? Furthermore, Story Points are tools that can be used by other tools. They want to be used. Let them be used. Don’t ever let them use you.
  • Leverage the fact that story points are built in to other tools. Jira and other common solutions will be able to spit out all kinds of reports based on story estimation information: Burndowns, burnups, velocities, Monte Carlos, etc. You think the exercise of slapping a point value on a story in grooming is a waste of time? What about the time wasted to wild-ass guesses, or business people crashing your meetings to check on progress, or performance reviews where you have to make excuses for why things weren’t delivered when expected?
    No. It takes but a single click for anyone to spit out cold hard data. The team doesn’t need to waste time in meetings or crunching numbers themselves. That’s valuable work time gained, in exchange for a ritual that takes but a second per story in grooming.
  • Consistency is power. The more consistent your story sizes, the more accurate your predictions can be. Never forget — interruptions and unplanned work will be a constant, to some degree anyway, no matter where you work. Accept it, plan for it, account for it.
  • Adjust and refine. Consistency doesn’t just happen. It takes introspection, looking at the patterns, and making adjustments. If the team sees that they have a habit of undercommitting (pulled in 30 points, but completed over 50), maybe start pulling in more points during sprint planning. Or if there is a lot of rollover (pulled in 50 points, but only completed 30), suggest that the team estimate higher and/or pull in less points in sprint planning.
  • Never correct as you go; only correct in between sprints. It’s tempting to adjust the size of a story up from 3 to 8 once you start working on it and realize it’s more complex than you initially expected. But doing so would make the sprint more accurate, which isn’t necessarily true. It would obfuscate a potential problem that needs correcting. If this is regularly happening, nobody would be able to recognize it. Every sprint, it would appear as if the team were doing roughly the amount of work that they had committed to.
    But if the adjustments are not made during the sprint, it could quickly become apparent that the team needs to estimate higher, or work on decomposing stories further; the team would see that every sprint, they aren’t completing the work they had committed to.

Put On Your Robe And Wizard Hat

It sounds like a lot of rules. And as in the beginning of this piece, I’m sure that although the reasoning makes sense, it comes off as rather thorny.

I cannot promise that you will not suffer a few thorn pricks in the beginning. A team must come together on a working agreement, definition of ready, and definition of done. And this must include the team’s agreements on story points — what they are, what they aren’t, and how best to leverage them.

But the benefits will come, as surely as they are invoked, and they will come quickly. The actual act of estimation will become near-instantaneous for a team on the same page. Relevant information can be gleaned from a moment’s glance at a burndown or velocity chart. Time-consuming and stressful wild-ass guesses are replaced by cold hard numbers available to anyone at a moment’s notice.

Put on your robe and wizard hat, and try practicing the Dark Art of Story Points — that is, leveraging them for all they’re worth in order to maximize the time the team may spend developing their products.

--

--