Monday, August 23, 2004

Manager’s Role for Bug-Weeding

Thanks to Brian Marick, I read Dave Thomas’s Weeding Out Bugs. Much of Bug-Weeding is developer turf. But here’s what managers can do to help:

  • Look at defect counts by module. When you see a module that has more than it’s fair share of defects, start asking questions about what the developers are considering. You’ll all need to weigh the risks of wholesale tossing and rewriting, but the manager’s job is to start the conversation if the developers haven’t already.
  • Make sure the developers fix builds before they go on to more coding. (I like to make sure the project team can build at least once every day. If your project team can’t already build frequently, consider a project to make it easy to build every day. And for those of you with humongous projects where you only build every week or so — continue at your peril. Break apart the builds so the project team can create a new build every day.) Developers have bad-code days. If you discover someone (or ones) had a bad-code day yesterday, you have an opportunity for very quick feedback to the developer(s) so he or she can fix his or her defects. If developers fix defects before they go on to more coding, you’ve reduced the overall defect counts. And you’ve really made a milestone, not just met a date where people claim they’ve met a milestone (”except for all those bugs we have to fix”).
  • Monitor the cost to fix a defect. Clarke recently wrote about Quantify the cost of reworking bugs?, to see how long bug-fixing added to the schedule. Managers can monitor if the finding cost is higher than the fixing cost. The higher the finding cost, the more time people are spending in defect-prevention activities. The higher the fixing cost, the more time people spend in defect-fixing activities.
  • Monitor the fault feedback ratio, the number of bad fixes to total fixes. My rule of thumb is a project can make progress with an FFR of 8% or less. More than 8% and developers are spinning their wheels.
  • Monitor system size. Back when I was a developer working on a substantial system, I was amazed at how my end-of-the-release defect-fixing caused my code to vanish. I now know I was refactoring at the end. It’s better to refactor as you go along, but sometimes, you can’t see the simplicities until the end. (A good reason to make lots of little ends.) Software follows an “S” curve of creation: some code starts, there’s a fairly dramatic increase, and then a tailing off. When developers refactor their code (the tailing off), they lose up to about 1/3 of the total number of lines of code (my data, yours may vary). If developers tell you they’re done, but the total LOC count doesn’t go down, you can look for other ways to test that doneness.

Managers can’t just monitor the schedule. (I can always claim to be done. The real question is how good is my work product in a given amount of time.) It makes no sense to only monitor product creation and cost unless you also monitor defect creation and cost. Then you can avoid the weeds.

Wednesday, April 14, 2004

Bret on the Blackout

Read Bret Pettichord’s How Did the Blackout Happen? to see how cutting yourself off from data can damage your ability to perform your job.

Monday, March 1, 2004

Why Defects/KLOC Doesn’t Supply Enough Information about Product Quality

A colleague emailed me a few days ago, and asked “for a code base with a [given size], what
can we expect to see for numbers of defects per KLOC (given the actual industry average or given what the industry believes we should expect). We need some way of gauging whether or not our defect rates fall within the industry standard, or if we are better or worse than the industry standard.”

The question of “Are we producing code with fewer, about the same, or more defects than industry standard?” is a reasonable question. Unfortunately, I don’t think it’s a particularly helpful question.

I object to defects/kloc (defects per thousand lines of code) for these reasons:

  • Defects/kloc treat all defects equally. So if you have developers who went to great lengths to make all the code solid but the writers didn’t have enough time to bullet-proof the help, the defects/kloc number is misleading. Or, if the developers prevented a whole bunch of serious errors but missed a bunch of not-so-serious errors the numbers look the same as if the developers missed errors over the whole project.
  • Defects/kloc change over the course of the project. Depending on the practices the developers use, they will find defects at a different rate. If the developers are using inspection or peer review or agile practices, they will find many more defects at the beginning of the project. If they aren’t using any of these practices, they will find a great number of defects at the end of the project. If the project stops testing during the hockey stick of finding problems, the total defects part of the equation is wrong.
  • Defects/kloc assume that there is an “average” consequence to each defect. Each defect is unique, and sometimes, it’s the sum of a bunch of non-related defects that matter to the customer’s experience of the product.

Ok, so what do I recommend instead of defects/kloc? If you must measure defects as part of your measure of how good the product is, measure the defect escape rate post release. At three months, six months, nine months, one year, and on at three-month intervals, count the number of defects that your customers found that you didn’t know about. That’s the numerator. The denominator is the total number of defects found (including these new ones). The better your perceived quality, the smaller the defect escape rate. The worse your perceived quality, the higher your defect escape rate. If the customers don’t find the defects, then they don’t matter to the customers. Those defects still may matter to you, but they don’t affect the customer’s experience.

To me, defects/kloc is something to measure when you want to see if your process is catching defects and dealing with them early. Snapshot code growth and defects/kloc weekly, compare the numbers each week, and you have some useful information you can use during the project to adjust course. But don’t use defects/kloc to reward or punish developers.

High or low defects/kloc is not indicative of how good the developers are; it indicates some sort of process problem or success (or coverup). The process is almost never a developer problem. Management decides where to spend the money. If the developers have a too-high defect rate, it’s almost always because management has overconstrained the problem so that the developers feel that they have to take shortcuts. To me, defects/kloc is a way to blame developers for inadequate management. That’s why I feel so strongly about it.

If you want to know about product quality, measure all six sides of the product quality equation. That will tell you about product quality more readily than defects/kloc will.

Wednesday, October 22, 2003

Showing Project Progress (NOT percent complete)

Last night at my SPIN talk someone came up to me at the end of the talk. I’d discussed earned value and inch-pebbles in my talk but hadn’t specifically discussed how to avoid the dreaded “percent complete” reporting problem to management. The percent complete problem occurs when you have to report progress to management as “the project is x% complete.” In my experience, the x has two values: 50% and 90%. The project doesn’t stay in 50% too long before it goes to 90% :-)

The problem with percent complete is that it might reflect % used of the schedule, but it doesn’t reflect the actual project percentage complete. To understand actual completion, look for earned value (how much you’ve accomplished based on completed work) or at the number of inch-pebbles completed.

Earned value works very well on agile projects. Agile lifecycles deliver value at the end of each iteration. It’s possible that a subsequent iteration will reduce the previous value by some small amount, but during that iteration, the value is increased by implementing some other feature. Here’s an example. In iteration 2, the form was “completed,” (as was the underlying database schema) to the best knowledge of the developers, testers, and customer. As the customer reviewed the user stories for iteration 5, the customer realized he needed to remove one field and add another field on the form (with appropriate database changes). In iteration 5, the form and the database were changed, initially reducing the earned value (removal of previous feature) and increasing earned value (addition/modification of previous feature). (If this is still confusing, send me email.)

Earned value works well when you have independent product parts — parts that don’t completely change when other pieces of the system changes. Agile lifecycles, along with staged delivery, design to schedule, concurrent engineering, and evolutionary delivery allow you to calculate earned value. If you use a true waterfall, you can calculate earned value, but I’ve found that the value of the early documents is not as high as code delivery (earned value is not even across the project). If you use a spiral lifecycle, you can’t calculate earned value because you’re allowing for change across the entire project every iteration.

So if earned value doesn’t work or you’re not willing to calculate it, consider inch-pebbles. Inch-pebbles are one- to two-day tasks that are either complete or not complete. No percentage complete allowed! If you add up the number of inch-pebbles and look at how many of them are complete (8 out of 100 inch-pebbles complete), you are roughly 8% complete. You could still be off, especially if your inch-pebbles at the beginning are smaller than your inch-pebbles at the end. Or, if you’re like me, and you iteratively plan the project, you have no idea how many inch-pebbles you have throughout the project, you only know how many you have for this phase.

Whatever you choose to do, make sure you don’t use percent complete. Percent complete leads you astray, into the 90% complete problem. (After you’ve done 90% of the project, you have the other 90% to do.)

When you discuss project progress, make sure you’re looking at all sides of the project pyramid so that you explain how the assigned people are using the budget, time, work environment to produce the product with it associated defects. That’s the most effective technique to discuss project progress.

Tuesday, April 29, 2003

Measure in the Middle

I ended up in the hospital last weekend (facial cellulitis - yuck). On my floor, we had people who were not too sick, who needed a few days to recover from an acute problem. Everyone’s prognosis was good, and the average stay on the floor was 3 days.

Recovering from an acute illness takes good nutrition, sleep, and moderate exercise. The kitchen helps patients plan menus so they can make better choices. Sleep and exercise are very difficult to attain in a hospital, surprisingly enough.

Luckily, my previous overnight hospital experiences were limited to childbirth. In OB, they know to let sleeping mothers sleep. They don’t wake you up to take your vital signs (Hey, you still alive?). However, on my floor, the nurses woke us up every 8 hours to make sure we were alive. They took blood pressure, temp, and oxygen levels. If your IV schedule doesn’t line up with your vital sign schedule (as mine didn’t), you have the opportunity to be awakened every 4 hours. I couldn’t wait to go home to get some sleep.

However, the idea of continual trend measurement is a good one. Taking the few vital measurements daily (or weekly or monthly) in a project, and for some period for people management is a Very Good Idea. Knowing where you start is critical, which is why planning is so useful. Knowing where you end up is also critical, but measuring in the middle is even more important. Measuring in the middle helps you complete the work to obtain the result you want.

Continuous measurement of vital signs helps you see when things are starting to go awry. If my temp had gone up even half a degree, that would have been enough information for my doctors to help me in different ways. If your project has an area where the defects start to increase, or the number of reviews starts to decrease, or the estimations are off (either way), you have an opportunity to continue to watch the project or take some action based on the measurements. If you don’t measure in the middle, you’re surprised by the result.

One of my favorite project measurements is the number of people I need on a project and the number of people I actually have on a project. I find staffing curves help me organize the WBS in different ways, and help me talk about potential project organizations differently with management. Here’s a project staffing table:

Month Planned Actual
1 2 2
2 4 2
3 6 4
4 6 6
5 6 6
6 6 8
7 not planned 8
8 not planned 8
total people-months 30 48

They’d originally planned a 6-calendar-month 30-person-month project. By the time I arrived (month 6, when they realized they weren’t going to make it), the best we could do was an 8-calendar-month, 48-person-month project. During the senior management debrief, a bunch of the senior managers wrung their hands and asked why they couldn’t make it. I showed them this table, and asked if they’d checked with the project manager at month 3. At month 3, it was clear the project they had wasn’t the same one they had planned. If they’d measured staffing (as opposed to trying to push to meet milestones), they would have seen this.

Product-projects aren’t the only ones that require interim measurement. Any cultural change “project” requires interim measurement. In one organization, we changed the culture from a “let’s have a meeting but not agendas or action items” to using meetings to come to agreement on decisions and track action/obstacle progress. Here, we measured the number of meetings per week, and the number of action items accomplished per week. As long as the number of action items per week from the meeting continued to go up, it was ok if the number of meeting went up. If the number of meetings went up, but the action items didn’t, we sent email like this: “Last week the total number of meetings increased. The number of action items didn’t. Please make sure you track your action items, and if you’re having trouble accomplishing your to-dos, don’t be afraid to ask for help.”

Measure your work in the middle, looking for trends that will help you understand progress and health of your effort. Don’t unnecessarily disturb the people, but make sure you’ve incorporated appropriate measurements.

Monday, April 7, 2003

Creating Silos Helps Managers Avoid Seeing the Data

In Sunday’s Boston Globe View from the Cube column, Lisa Liberty Becker claims “Telling the truth can be hazardous to your job”. She goes on to talk about her husband, a performance test engineer, whose manager buried his reports, because “they [the reports] reflect poorly on the job he’s done.” The result? Bad product performance, so of course the performance engineer was laid off.

I don’t understand why this organization chose to separate the developers from the testers, or didn’t use the same defect tracking system, or why they wrote reports (maybe to see trends). Maybe there just wasn’t enough room in the article to explain fully.

But what stood out for me was this:
The managers are deliberately preventing themselves from seeing data that would help them make better decisions.

  • If the developers and testers were part of the same team, they would be talking among themselves and the managers couldn’t ignore the data.
  • If the developers and testers used the same defect tracking system, it would be close to impossible for the developers and testers to avoid the data. They would bring it to their managers’ attention, and then maybe the
  • If this company used a cross-functional team to evaluate problems during the release, the other people would also see the data, and not allow the managers to avoid the data.

This is a clear case of management incompetence resulting in laying off the competent people, such as the performance engineer.

Data for managers is like code for developers; every piece of data requires multiple eyes. Otherwise you may not realize what you’re seeing. And if you can’t see the data, you can’t choose which actions to take.

Monday, March 10, 2003

Single-Dimension Measurements: How NOT to measure technical staff

I’m facilitating a roundtable, Test Management 101, on Stickyminds, and someone just posted a question about how to measure testers to show return on investment: measuring the number of defects they find. Ouch. When you measure developers on the number of lines of code, or testers on the number of defects, or carpenters on the number of cabinet doors they create, that’s a single-dimension measurement. Single-dimension measurements skew everyone’s thinking about who’s good, who’s not, and what to do about it. Single-dimension measurements lead to Dilbert-esqe situations, such as the one where the PHB (Pointy Haired Boss) announces extra pay for each defect fixed. Wally decides to write himself a minivan (create defect-laden code which isn’t measured, and then fix it, which is measured).

Technical staff assesssment is impossible to measure with a single-dimension measurement. You need at least some combination of how the person works, how much they create, and how good that creation is. Fuzzy to start? Yes. Customized to each situation? You bet. Hard work? Yup. And, that’s the job of management - to look beyond easy non-answers to define the real measurements.

(I’m working on an assessment mechanism that will require each manager to define the kinds of people, the knowledge required, and how the employee applies that knowledge to the product as part of the assessment. It’s hard. And when I’m done, it will be much better than single-dimension measurements or those useless review sheets companies use now. If you want to help me test the tool, email me. )