Posted Tue, 8 Mar 2011, 11:30am by Shawn D. Sheridan
This series of blog posts originally appeared as a white paper I wrote a number of years ago. Nonetheless, the content is still relevant today, and useful to anyone in the software engineering business, be it commercial software producers, or in-house development shops.
What to Measure
Number-of-defects is a poor measure of (lack of) software quality. ‘Size’ of the defect (what it takes to repair it) and severity of the defect (what damage it caused) are better indicators. When severity can be quantified in terms of cost, the sum of cost-to-fix and cost-of-severity gives cost-of-poor-quality.
For example, by numbers alone, a software initiative with five defects looks worse than an initiative of the same effort with one defect. However, if the five defects in the former initiative are five minor spelling errors on screen labels, and the one defect in the later is in a complex computation of fees to be paid to third parties resulting in material underpayment opening the company to litigation for breech of contract with associated penalties, and possible fraud proceedings, it is hard to argue that the latter initiative is of better quality than the former simply because it had one-fifth the number of defects.
In addition, cost-of-poor-quality must be looked at as a rate, and not in aggregate, in order to compare against benchmarks, compare between initiatives of different size, and against historical metrics to be able to appropriately show improvement.
Again, by way of example, consider the following two initiatives.
- Project Alpha built a software product with 90 function points, and had 30 defects, costing in aggregate $60,000.
- Project Beta built a software product with 500 function points, and had 50 defects, costing in aggregate $120,000.
By looking at absolute numbers and costs, one would assume Project Alpha produced a higher-quality product than Project Beta. However, project Beta’s product was over five times the size of that of Alpha’s in terms of function points, yet the number of defects was not even double that of Alpha’s. In addition, the total cost-of-poor-quality in project Beta is only twice that of Alpha’s, despite Beta’s product being over five times the size! So which product was of better quality? If we use a rate based on function points then we get the following:
|Number of Defects||30||50|
|Aggregate Cost of Defects||$60,000||$120,000|
|Number of Defects per Function Point||0.33||0.10|
|Cost-of-Defects (cost-of-poor-quality) per Function Point||$667||$240|
Clearly, Beta had lower rates (per function point) of both numbers and cost of defects than Alpha. In other words, for each function point of software the project produced, it had fewer defects, and the ones that were there cost less to fix per function point. Only by looking at the rates can we determine that the Beta team was more successful at producing quality software than the Alpha team. And that means it’s the Beta team’s behaviours, practices, and processes that we want to learn and emulate, not the Alpha team’s.
Thus, for all software development, a measure of “size” or “volume” of the software needs to be captured. This measure is different from effort. Effort-to-produce is a poor measure of the size of a software product because it is influenced by choices of technology, personnel, and a number of other factors that we may wish to measure against size for other management reasons. While not perfect, some of the best measures available are function- and feature points. These focus on application functionality / capability, and not how that functionality is implemented. Once we have an appropriate size / volume indicator to use as a denominator in our rate calculations, then we can start to compute useful metrics from other data we capture for each defect.
When a defect is discovered, key information should be recorded at that time and subsequently. That information should be compiled in a central repository to allow aggregation and the automated production of metrics. The information that should be collected is as follows:
- In what stage of the SDLC was the defect discovered?
- By what method was the defect discovered? For example, was it through code inspection, design walk-through, peer review, unit testing, some other testing, or via production use?
- In what stage of the SDLC was the defect introduced?
- What was the cause of the defect? For example, was it misinterpreted requirements, incomplete requirements, completely missed requirements, a design fault, a coding error, or an improperly constructed test?
- What was its severity?
- What effort was required to repair the defect?
- What other costs were associated with remedying the consequences of the defect, such as compensation to customers, forfeiture of fees, or payment of penalties?
- In what area of the system (module, class, procedure, etc.) was the defect introduced?
From these eight aspects of a defect, as well as sizing and effort tracked against the individual piece of work, the entire initiative, and against all initiatives over a period of time, we can derive extremely useful metrics to guide process improvement and reduce costs.
 The best way I have found to do this is through the use of a comprehensive time-tracking and work-management system that captures the work to be done at a relatively granular level (a ticket-based system that roles multiple tickets into releases, tracks work against SDLC stages, activity types within those stages, and records the size of the work in terms of function or feature points), and has the capability of attaching defects to original pieces of work, and tracking time against the defect repair. The use of standard categories for attributes of the work, SDLC stages, defect types, detection methods, and the like is preferred over free-form test as it vastly improves the ability to automatically generate metrics. That being said, it’s key that the system to be used is extremely easy and user-friendly for the people who have to record the raw data. If it is not, they will resist doing so, producing unreliable underlying data, and therefore unreliable metrics.
 These are quantified costs, in terms of dollars.