The common sense unit of work

What if we were to model a typical software development lifecycle in code?

The unit of work would be the fundamental abstraction. We’d build state machines and workflows around it, carrying it from specification to deployment through activities performed by product managers, engineers, designers, and others. The process could be customised to each team’s needs, with all the bells and whistles. But fundamentally, its effectiveness and adaptability depends on how good this central abstraction is.

Get this abstraction wrong, and complexity scales exponentially. All the processes built around it inherit the dysfunction. Planning becomes chaotic, progress becomes opaque, and coordination becomes an expensive mess.

We deal with leaky abstractions by periodically refactoring them, so why not do the same thing with the unit of work? What makes a good unit of work? Let’s walk through these familiar activities and observe the properties that emerge.

Breaking it down

We typically start with product or feature requirements. We don’t usually take on a full feature in one shot, it’s “too big”. Especially if it’s complex enough to need some technical design and specification written along with it. We break it down into small parts that are more approachable for solving, and also give us a steady sense of progress.

Now the product requirement is actually a hypothesis for creating business value, and we need to validate the hypothesis as early as possible. So, the small parts need to be valuable to the customer.

In other words, we need the unit of work to be a slice of the cake, not a layer.

slice-of-cake

Of course, bug fixes and refactors aren’t providing value in the same way, and that’s okay. Sometimes there are technical tasks that are best left independent. That’s okay too. No need to be dogmatic as long as the broad needs of value and sense of progress are being met.

Planning

Before starting work, we want to prioritise, because it saves a lot of time. We want to ship the most valuable slices first, and perhaps discard some low priority slices. But we can’t prioritise without weighing the business value against the implementation effort. All slices aren’t the same size, so we estimate the implementation effort first.

Then, some large slices can have low product value, so we would want to break them into even smaller slices to prioritise parts we care about most. Some other large slices can’t be sliced further meaningfully, and that’s okay. Some smaller slices can’t be engineered independently, so we build the larger slice anyway. The unit needs to be negotiable.

planning-with-slices

And since we’re doing this as a team, we’ll want to ensure that the slices are as independent as possible, so that we can each do our part without waiting, and we don’t step on each other’s toes.

Gathering context

A unit can be specified today, picked up for execution next month, blocked by another task, and then deprioritised into the backlog. Over its life, it gathers context about various things:

What value it provides, how to verify it
How it needs to be implemented
Missing pieces of context that came together after conversations
Unknowns that were resolved or unresolved
Who worked on it, what issues they ran into
What bugs came up in testing, and QA before release

gathering-context

Keeping these pieces of context collected in a single place helps in picking it up from where it was left off. When discussing, implementing, or tracking, it’s useful to have the same artifact in front of us.

Solving

Knowing exactly what we’re solving for is very helpful, so we can build just enough software™️. No more, no less. So we need to define the acceptance criteria that we can all agree on.

Then, solve until we meet them.

It’s good to automate checking whether they meet the acceptance criteria, because we’re going to be doing that an awful lot while solving.

Verifying

Confidence usually doesn’t require checking every possible case, only the key ones that capture most of the impact. Yes, we checked this slice at every step of the way, but it is useful to inspect it one last time before serving.

When is a unit considered done? When the slice has been served. When it’s in the hands of the user, in production, potentially behind a feature flag.

verifying

And that’s it. To manage the life cycle of software development, we manage the unit of work. Some would say we need to INVEST in good units of work. And some of you might rightly recognise that it looks like a User Story. But as long as the described properties and affordances for its users exist, it should make for a decent unit of work regardless of what we call it.

Does your unit of work need refactoring?

We’re fairly aware of the penalties of leaky abstractions in software. The incidental complexity of getting our primary real world abstractions wrong, grows exponentially with each layer of software built over it, until the whole system is slow, sludgy slop that’s difficult to work with. We can hack it here and there, and celebrate minor wins, but the big wins were lost in the ignored opportunities to refactor that central abstraction.

If we apply the same thought process to software development, we’ll see that our core abstraction, the unit of work, might need refactoring.

Big gains in developer productivity in this economic weather are important. Organisations that use DORA measure deploy or commit frequencies might find them valuable in some dimensions, but they’re not a measure of productivity in terms of outcomes for the customer. I love these last lines in Kent Beck’s writing about measuring developer productivity:

Be suspicious of anyone claiming to measure developer productivity. Ask who is asking & why. Ask them what unit they are measuring & how those units are connected to profit.

I am 100% pro-accountability. Weekly delivery of customer-appreciated value is the best accountability, the most aligned, the least distorting.

And I think a unit of work as defined above could be used to measure productivity holistically. Prioritising by value, eliminating unnecessary work, and validating quickly then become obvious, and measurable ways to increase productivity.

Productivity gains through use of AI assistants is also popularly reported and benchmarked in terms of % of code generated, but that’s not a very valuable dimension for measurement. If the benchmarks for AI productivity revolved around units of work valuable to the customer, then we’d be talking true productivity gains. AI assistants also need small, well specified slices of work, and hence, will also benefit from a well defined unit of work. My colleague Atharva has written a wonderful blog post about that in detail.

Yeah, this article is mostly about rehashing a two-decade-old pitch for some common sense agile. But I hope it has been worth your time.

Annexes

In reality, the workflow isn’t as linear, and there is much back and forth between the steps. I’ve kept it simple to focus on the properties.
Yes, I’m aware the classic definition of user stories doesn’t have implementation details.
Slicing can happen across many dimensions, and breaking down a hard problem effectively, can actually be a very hard problem.
If you want to read the OG Agile material, you can read:
- Kent Beck introducing story cards in XPX (Chapter 15 on planning)
- Bill Wake’s writing, and the INVEST criteria are condensed, quick reads
- The C2 page on User Stories for opinions and some discussions
- Ron Jefferies on Card, Conversation and Confirmation.
- Mike Cohn’s User Stories Applied is a deep dive.
I like Gergely Orosz and Kent Beck’s response to McKinsey on measuring developer productivity. Gergely’s writing about DORA, and SPACE is interesting, but I wonder if metrics can be more granular, around this unit of work, and its affordances. That would shift-left the feedback on productivity, to where it matters.