16 Critical Software Practices 
 
Use Metrics to Manage
 

 
Management Metrics
 

16-Point Plan Metrics Overview
 
The third of the sixteen Critical Software Practices is "Use Metrics to Manage." This document provides guidance on setting up a metrics program, selecting metrics to use, and provides a description of the minimal set of management metrics that should be used for every project.
 
A software metrics program has many purposes. Examples include: cost and schedule estimating, identifying and controlling risk, evaluating bids, allocating resources, managing requirements, predicting schedules, reducing defects, assessing progress, and improving processes.

Additional, but often under-used, uses of software metrics include:

  1. Employee communication (How is the group doing?)
     
  2. Employee development (Where do we need training to do better?)
     
  3. Customer communication (What are we accomplishing for you?)
     
  4. Tacking activity on action items (Are we doing what needs to be done?)
     
  5. Building a sustainable competitive advantage by focusing on efficiency.

Most software metrics are designed to give a manager an accurate picture of where they are and what is going on in the software development. And, while predicting the future is a notoriously difficult undertaking, some metrics can provide advance warnings about what lies ahead. While it seems obvious that a good manager should prefer using quantitative information to relying on gut-instincts for making management decisions, curiously, this is not always true in the case of software metrics. When managers are questioned about their reluctance to use software metrics, they will often complain that the metrics are too complex, too expensive to collect, not measuring the right things, or presenting an inaccurate picture of the situation. The guidance provided here is designed to help you avoid those difficulties with your software metrics program.
 
Software development is a complex undertaking. Successful management requires good management skills and good management information. A sound software metrics program can contribute significantly to providing great management information, but, but it will not replace the need for good management skills. Successful metrics programs must provide sound management information while ensuring low- cost, simplicity, accuracy, and appropriateness.
 

 

 The Keys to Setting Up a Successful Software Metrics Program

The keys to setting up a successful software metrics program, then, are:

  1. Keep the metrics simple. Remember that the goal of a metrics program is management not measurement. This means that each metric must be directly related to a management issue that you wish to address. Avoid the temptation to collect everything, hoping that something interesting will show up.
     
  2. Understand the different types of metrics indicators. Some metrics are designed to provide an accurate assessment of where you are on a complicated development path, so that you can more accurately chart a course of action for the future to reach your goals from where you are. Other metrics are designed to provide an early identification of processes that have broken down. Still others can provide an advanced warning of trouble ahead. All types are important, but they require different management attention. Earned -value metrics, for example, are designed to provide an accurate picture of where the project is with regard to cost and schedule. It measures what has happened so far relative to the plan, and thereby helps a manager quantitatively identify what he has to do in the future. Accurate measurement of the size of any problem that has occurred is essential to the creation of a successful recovery plan. An example of a different type of indicator is the measurement of overtime. Industry data indicates that long-term use of excessive overtime results in a loss of productivity. This type of indicator can allow a manager to take action before a problem develops.
     
  3. Keep the metrics measurement cycle short. Getting high-level management attention on a problem that the organization had several months ago, but which has already been noticed and solved by lower-level managers can drive an organization crazy. If the information isn't current, it is, at best, worthless.
     
  4. Use a balanced set of metrics. Organizations will respond to what is of interest to management, sometimes to the detriment of other important areas. Quality can be traded for cost, and schedule can be improved at the expense of quality. Unbalanced metrics programs can fail because they drive an organization into undesirable behavior.
     
  5. Keep the metrics collection cost down. Normally, software metrics collection program costs should not exceed 5 percent of the development cost. Automated tools can be implemented to collect many metrics. Be sure to take advantage of useful data already being captured (for example, a time reporting system with a good work break-down structure can be used to capture costs, and a good configuration management tool can collect product volatility data).
     
  6. Focus on a small set of important metrics. Some metrics programs suffer from a tendency to collect more and more information over time. Additional metrics do not always provide sufficient additional information to justify their expense. Metrics will not replace management, and are most effectively used to provide data about potential problem areas to focus management attention. The desire to collect more and more information is often driven by a misguided notion that the metrics will not only provide information, but also unambiguously tell a manager what to do with that information.

With this guidance in mind, let's turn to selecting the proper metrics.
 

 

 Deciding on the Metrics to Collect
 

There are literally thousands of possible software metrics to collect and possible things to measure about software development. There are many books and training programs available about software metrics. Later in this document, we will provide a "minimum set" of top-level metrics for every program,; but, in most cases, additional metrics that are specific to the project, type of software development, or organization will be desired. So which of the many metrics are appropriate for your situation? One method is to start with one of the many available published suites of metrics and a vision of your own management problems and goals, and then customize the metrics list based on the following metrics collection checklist. For each metric, you must consider:

  1. What are you trying to manage with this metric? Each metric must relate to a specific management area of interest in a direct way. The more convoluted the relationship between the measurement and the management goal, the less likely you are to be collecting the right thing.
     
  2. What does this metric measure? Exactly what does this metric count? High-level attempts to answer this question (such as "it measures how much we've accomplished") may be misleading. The detailed answers (such as "it reports how much we had budgeted for design tasks that first-level supervisors are reporting as greater than 80 percent complete") is much more informative, and can provide greater insight regarding the accuracy and usefulness of any specific metric.
     
  3. If your organization optimized this metric alone, what other important aspects of your software design, development, testing, deployment, and maintenance would be affected? Asking this question will provide a list of areas where you must check to be sure that you have a balancing metric. Otherwise, your metrics program may have unintended effects and drive your organization to undesirable behavior.
     
  4. How hard/expensive is it to collect this information? This is where you actually get to identify whether collection of this metric is worth the effort. If it is very expensive or hard to collect, look for automation that can make the collection easier, or consider alternative metrics that can be substituted.
     
  5. Does the collection of this metric interact with (or interfere with) other business processes? For example, does the metric attempt to gather financial information on a different periodic basis or with different granularity than your financial system collects and reports it? If so, how will the two quantitative systems be synchronized? Who will reconcile differences? Can the two collection efforts be combined into one and provide sufficient software metrics information?
     
  6. How accurate will the information be after you collect it? Complex or manpower-intensive metrics collection efforts are often short circuited under time and schedule pressure by the people responsible for the collection. Metrics involving opinions (e.g., what percentage complete do you think you are?) are notoriously inaccurate. Exercise caution, and carefully evaluate the validity of metrics with these characteristics.
     
  7. Can this management interest area be measured by other metrics? What alternatives to this metric exist? Always look for an easier-to-collect, more accurate, more timely metric that will measure relevant aspects of the management issue of concern.

Use of this checklist will help ensure the collection of an efficient suite of software development metrics that directly relates to management goals. Periodic review of existing metrics against this checklist is recommended.
 

 

 Managing from Metrics

With the metrics decided upon, a database for the metrics information must be created. Metrics are not useful if they can not be easily reviewed, analyzed for trends, compared to each other, and displayed in a variety of manners. If you are going through the expense of collecting metrics, don't short -change yourself in the area of making the metrics useful, but be sure to take advantage of free metrics tools available where they are appropriate (example sources of tools include:
Software Technology Support Center (STSC), the Software Engineering Institute (SEI), the Software Productivity Consortium (SPC), and the Software Program Managers Network (SPMN)).
 
So if you are collecting sound and timely metrics and can display them in a useful fashion, how do you glean management information from all those numbers?. How do you tell a good number from a bad number?
 
If you look at the values for metrics collected in the literature, you can find general industry data. This data is usually shown in simple graphic relationships. The literature normally shows simple regression lines illustrating such well-understood facts as "productivity is lower on large, complex developments," or "error rate is higher under schedule compressed development."
 
Unfortunately, the scatter of the data points around this regression line is often multiple orders of magnitude, meaning that the actual value of the "industry norm" for any specific metric is not likely to apply to your environment at all. Software metric values change by type of project, size of project, type of software, schedule pressures, management style, corporate culture, and a host of other factors. Even financial information for Defense contractors is collected by different practices (for example, there are several acceptable methods accounting for overhead, management, and administrative expenses), making it nearly impossible to compare the "cost for a line of code" across companies.
 
Software metrics are often erroneously viewed as having one right number for what is good. This is very seldom true. As an extreme example, several years back, while I was managing a large software department, the vice-president that I worked for insisted that I provide productivity figures for all of my engineers, expressed in delivered lines-of-code per month. Fortunately, our existing system permitted a reasonably easy method for collecting this data. Unfortunately, we were in the late stages of a real-time software development, and I had assigned several of my best engineers to improve the performance of some real-time software components. One of the methods that they were using to do this was to reduce the amount of code it took to perform certain tasks. The result was that some of my best engineers were actually reporting a negative number of lines of code per month, causing much confusion about the usefulness of this specific metric.
 
So what should you do? Certainly looking at the "industry norm" data can provide some understanding of the metric, but you should not be too alarmed about where you stand regarding the industry norm. Instead, you should try to research published metrics for projects, companies, and types of software development that are the closest to your situation. The closer the published metric is to your environment, the more relevant it is for comparison.
 
More useful, however, than looking at specific values for any metric is deciding where you want to be relative to where you are, building a plan, executing the plan, and measuring your progress. If you have a metric where it is not obvious whether you should be trying to raise or lower the value of the metric, then you are collecting the wrong stuff and need to revisit the previous section of this document.
 
Proper execution of a metrics program, then, requires that you set reasonable goals for each of the metrics you are collecting, and try to achieve them. If the trend is in the right direction in a balanced metrics program, you are doing the right kind of things. If a trend is not heading where you want it to go, go investigate the causes, evaluate the situation, and make the necessary corrections. Seldom should you ever take action "just based on the metrics." Anytime a metric indicates an undesirable trend, there are many possible causes. Don't assume that you know the "obvious" problem without investigating. A pilot wouldn't shut down the engine of an aircraft based on a bad engine indicator light without first confirming that the problem is not with the indicator light.
 
Finally, share the current information about the metrics and goals with your employees and your customers. Your customers will be able to see what you are doing to try to accommodate their needs (and may also notice when they have done something to adversely affect the progress of the project). Your employees can provide incredible insight into bottlenecks and inefficiencies in the development process if they can see what measurements are important and what goals are trying to be achieved.

Example of How to Use Metrics
 

The most effective use of metrics involves the use of "trigger" values. Triggers cause different levels and type of follow-up action in response to trigger values being exceeded. The exact value of the triggers, and they type of follow-up required, will, of course, vary by the metric involved, the organization, and the project, but this section will provide an example of how triggers are used. The following example illustrates the use of the one earned value measure and how it can be used to identify and address problems and initiate timely corrective action. This example illustrates how to use Schedule Performance Index (SPI) from the earned -value metrics to establish a set of metric-triggered action plans:

  1. When SPI drops below .96, the first-level management initiates a follow-up to assess the actual cause(s) and remedies. Corrective actions available to first-level management will be initiated.
     
  2. When the SPI drops below .88, the causes of the schedule performance problem will be tracked in the formal risk management system. The status of activities to recover schedule slippage and avoid further slippage will be a discussion item at monthly risk reviews.
     
  3. When the SPI drops below .75, senior management will be advised of the need for significant action to deal with program schedule problems. Alternatives will be prepared and presented to senior management for decision.

Establishing trigger values and providing predetermined management levels to be involved focuses the use of metrics and directs management attention to deal with the most significant problems. Following a process like this makes efficient use of metric information.
 

 

Minimum Set of 16- Point Plan Metrics

Often, a single metric can be used to measure progress in several management issues of concern. Other times it requires several different metrics to adequately measure a single area of management concern. The following sections will provide a recommended minimum set of software project metrics that will provide a top-level overview of project status and progress, as well as provide early identification of emerging problems. The minimal set of metrics includes the following list of eight measurement areas, most of which consist of a single metric, but some of which include several tightly related metrics. These measurement areas are:

  1. Earned Value (CPI & SPI) 
  2. Staffing (key staff percentage utilization, unfilled positions, turnover)
  3. Schedule compression
  4. Requirements management
  5. Structured peer review action item maturity/turnover. Defect maturity/ turnover
  6. Test adequacy (early test definition, test coverage, performance testing)
  7. Risk summary and reserve

Figure 1 provides a mapping of these metrics into the three major areas of the 16-Point Plan.

Figure 1. Metrics Relation to Practice Area

Application of the 16-Point Plan metrics provides all project stake holders a common and quantitative means to monitor risk and project success in a timeframe that which will help the manager to avoid or minimize project impacts and costs of corrective action.
 

Practice Measures
 
This section identifies and discusses each of the metrics included in the minimum metrics for the 16-Point Plan.

1. Earned Value Measurement

Earned value techniques measure progress against a plan. There are more than a dozen metrics normally computed by earned -value systems, but the two most practical, high-level progress indicators are the Cost Performance Index (CPI) and the Schedule Performance Index (SPI) (both current and cumulative). These metrics measure the percentage performance in achieving your plan's milestones and cost targets. The Defense Systems Management College (www.dsmc.dsm.mil) is a good source of information about establishing earned -value systems.
 
It needs to be reiterated that an earned -value system only measures progress against a plan. Unfavorable values for earned -value metrics can indicate problems in performance, problems in planning, or both. They do not automatically indicate bad performance. The problem may be bad planning or in the identification of events that have disrupted the plan.
 
For top-level monitoring of whether a project is proceeding according to plan, current and cumulative CPI and SPI are an integral part of the minimum set of metrics.

2. Staffing

The second high-level measurement area in the minimal set of metrics relates to staffing. There are a few key indicators that will provide information that will give advance warning about staffing problems. These are key staff percentage utilization, staffing profile, and turnover. "Unfilled positions" is sometimes substituted for staffing profile. 
In times of a shortage of skilled software professionals, talented employees are often spread very thin across many projects. Key staff percentage utilization measures, for the few personnel that are identified as key to the success of the project, the percentage of their time that they are spending on a specific project. When employees that are identified as key personnel are spread across more than two projects, it is generally a sign that all their projects will get in trouble.
 
Unfilled staff positions on a project can cause a project to get into trouble early and never recover. The number of unfilled full-time staff positions should be continually monitored as an early indicator of impending project difficulty. This is easily monitored through a simple time chart as illustrated below.

There will always be turnover in a software organization. Some turnover is good. Normal industry turnover rates change with market conditions, locale, and type of software being developed. Turnover can cause project disruption, requiring training of new employees, disruption of learning curves, and inefficient re-start of tasks. Projects should always allow for an expected turnover rate in their plan. When the turnover rate significantly exceeds the expected turnover rate, it should be a major cause of concern for the program.
 

3. Schedule Compression

Schedule compression is often an indicator of a flawed plan. Excessive schedule compression indicates an expectation of performance far exceeding that which has been experienced in the past. It should be noted that, on very rare occasions, programs have been successful with high levels of schedule compression, but this event is so rare that abnormal schedule compression should be considered as advance notice that the planned schedule will not be achieved.
Software development teams naturally learn with experience. However, it is seldom possible to compress schedule significantly. Schedule compression to less than 85 percent of the nominal schedule represents high risk. Often tasks will be compressed to make up for slips/delays early in the program. Unless there is a rational basis for the change then any change is probably unwarranted.
 
In order to schedule compression you first compute the nominal schedule in months. This is the statistical norm for similar size and types of projects. The schedule compression is then:
 
Schedule Compression = [ (calendar time scheduled) / (nominal expected time)]
 
If this percent is a number less than 0.85, it is very unlikely that you can meet the schedule, and the schedule estimate should be revised to make the schedule longer. It is recommended that the schedule compression be no less than 0.9.
 

4. Requirements Management

System requirements define user needs and expectations. Software requirements bridge the gap between the system-level requirements and the software design. Requirements traceability links system requirements to derived requirements for hardware and software modules, helping ensure that system requirements are implemented. Requirements analysis is often the least understood and most difficult part of the software life cycle. Two key indicators provide a top-level overview of the requirements management effectiveness of the program: requirements completeness and requirements churn.
 
During the development of requirements, sometimes not all requirements can be fully defined at the time of the requirements review. With the speed of technology change and the length of some development programs, some requirements are, in fact, best left undefined until later in the life cycle, when the emerging detailed design can help to finalize trade-off decisions, using the latest technology. With proper design, the effects of such late definition can be minimized or eliminated. With other requirements, however, late definition can be severely disruptive, especially if these requirements exert a strong influence on the fundamental design of the system. TBDs are place holders for requirements or parts of requirements whose definitions are deferred until later in the life cycle. Requirements deferred as TBDs must be specifically reviewed for program impact on a periodic basis. Requirements completeness is reported as the number of remaining TBDs and the total number of requirements. For each TBD requirement, a planned finalization date must be provided.

Another key indicator of effective requirements management, called requirements churn, is the number of new or changed requirements per month. The requirements churn metric provides a measure of the amount of misdirection provided to the software developers. As requirements change, the impact on lower- level products project stability increases dramatically. This measure is not a aggregate measure. It reports change only for the reporting the reporting period.
 

5. Structured Peer Review Action Item Age
 
Structured Peer Reviews provide the opportunity to identify software deficiencies early. Not only is it important to identify the deficiencies early, it is important to correct the deficiencies in a timely manner. The age profile of open action times from structured peer reviews provides an effective top-level view of the project's effectiveness in correcting these identified shortcomings. An example of an action item age report is provided below.
6. Defect Metrics

A defect is defined as an instance where the product does not meet a specified characteristic. The finding and correcting of defects is a normal part of the software development process. Defects should be tracked formally at each project phase. Data should be collected on effectiveness of methods used to discover defects and to correct the defects. Through defect tracking, an organization can characterize their error propensity and then focus their resources appropriately. Two metrics provide a top-level summary of defect-related progress and potential problems for a project: -defect Profile profile and defect age.

The defect profile chart provides a quick summary of the time in the development cycle when the defects were found and the number of defects still open. It is a cumulative graph. An example is provided below.

The defect age chart provides summary information regarding the defects identified and the average time to fix defects throughout a project. The metric is a snapshot rather than a rate chart reported on a frequent basis. The metric evaluates the "rolling wave" phenomenon, where a project defers difficult problems while correcting easier problems. In addition, this measure provides a top-level summary of the ability of the organization to successfully resolve identified defects in an efficient and predictable manner. If this metric indicates that problems are accumulating in the longer time periods, a follow-up investigation should be initiated to determine the cause.

Test Adequacy
(early test definition, test coverage, performance testing)

Testing evaluates a software product to ensure that it satisfies its intended purpose. A test that is tailored to and consistent with development methodologies provides a traceable and structured approach to verifying requirements and quantifiable performance.
 
Three closely related metrics provide a good top-level summary of the efficiency and effectiveness of the test process. They are: early test definition, test coverage, and performance testing progress.
 
Early test definition measures that test expectations and specific test procedures are prepared in accordance with the schedule. How early these items can be prepared is often dependent upon the specifics of the project, but they should be scheduled as soon as possible. Software development is much more likely to be successful if the developers have a complete picture of the tests that the software is expected to pass. The metric is reported simply as test plans scheduled vs. test plans complete, and test procedures scheduled vs. test procedures complete.
 
As software components are tested, it is essential to track the requirements that are being verified to know the extent to which the requirements are being met. As a component undergoes testing, the requirements that have been tested for successfully must be tracked and reported.

An example of the reporting of white-box and black-box testing requirements is illustrated below.

The final test metric relates to technical performance testing. The issues in this area vary by type of software being developed, but top-level metrics should be collected and displayed related to performance for any medium- or high- technical risk areas in the development. An example is shown below for a project with medium risk of remaining under the processor allocation budget.
Risk Summaries and Reserve
 
Effective continuous risk management requires risk visibility. The best top-level indicators for summary risk management are the risk summary and reserve charts. Cost and schedule risk reserves should be established at the beginning of the project to deal with unforeseen problems. Risk summary and reserve charts show the total risk exposure for cost and schedule compared with the current cost and time risk reserves for the project. The cost and time risk reserve for a project will change over time as some of this reserve is used to mitigate the effects of risks that actually occur and affect the project. 
The charts show both the total identified risk values and the probabilistic weightings of occurrence. As risks are actualized without complete abatement, or resources are expended in the risk-abatement process, the risk reserves are adjusted downward accordingly. An example display of a cost risk summary and reserve chart is provided below. Schedule risk summary and reserve charts are similar, but reflect schedule risk instead of cost risk.

Summary
 
Section 2 provided a minimal set of top-level metrics that support the implementation of the 16-Point Plan and are applicable to all software development projects. These metrics have the advantages that:

  1. They provide an easy-to-understand snap-shot of major indicators of a project's likely success
     
  2. They can be standardized across a large number of projects (minimizing training time and the need for different collection mechanisms)
     
  3. They are relatively easy to collect
     
  4. They have been shown to provide an accurate, balanced assessment of important indicators of success. 

These metrics, however, provide only the top-level view of a project. Good project management requires the addition of metrics directly related to a management issue that you wish to address. Section 1 provided a method to do this and provided guidance on the selection of appropriate metrics.

top
16 Critical Software PracticesGlossary of Terms