© 1997 Johanna Rothman.
Organizations are spending more time and money on their testing and process efforts. But, how do you know whether or not the testing and process improvements efforts are paying off? One way is to define specific metrics to measure the effectiveness of your process, and the efficiency with which the process is carried out. This paper will discuss some specific measurements, oriented towards commercial organizations. Keywords: software metrics, software quality, process improvement
As Software Quality practitioners, we have been asking for more: more people, more tools, more training, and more process improvements. Now that we have received more, our management wants to know more: not just about when the software will ship, but about the value of their investments in quality.
There are any number of investments companies make in quality: people, tools, and training. When more people are hired, after the initial hit to productivity, there is more available ability to implement test frameworks and testing, and process analysis and improvement. In Figure 1, you can see that experienced people perform the work activities. In the case of software testing, they perform all relevant testing activities: design, implementation, running of test cases, and they contribute to the productivity of the organization.
Figure 1: Original Productivity When new people are hired into an organization, time is taken away from their product-specific work, and diverted to the hiring activities. In addition, even if the experienced people do not play a part in the learning activities of the new people, there is a delay before the productivity of the new people can be added into the productivity of the current organization. 
Tools can also affect productivity. For example, Source Configuration Management tools can eliminate the integration process for software development. Instead of the “big-bang” theory of software integration, the software can be integrated as different pieces are written. So, the previous time invested in integration activities can be drastically reduced. For example, one organization previously integrated their software using a sort of fishbone method. Each subsystem was integrated on a specific schedule when it was complete. There was time built into the schedule to “shake out” all the defects introduced by this integration method. At the end of the integration, there was a very long period of time to continue finding and fixing defects. The total integration period was about 4 months for all the releases done before an SCM tool was acquired and used.
Figure 3: Fishbone integration Now that this organization uses an SCM tool, they essentially integrate every day, and have managed to cut the integration months out of their development schedule.
Training people to do their jobs better can have a tremendous impact on productivity. For example, when people learn how to perform their roles, they can improve their productivity. In addition, they frequently have better ideas about how to perform their roles.
Process improvements can dramatically affect an organization's productivity. Even when code reviews or walkthroughs are done poorly, they can help find defects that might not have been found at all, or only found in testing. 
Software Engineering productivity has two components: efficiency and effectiveness. The traditional productivity ratio of output to input is too imprecise here- there are too many inputs to product development and too many outputs (work products). For our purposes, we will define efficiency as performing what needs to be performed as quickly as possible. Effectiveness is using the resources in the most adept manner. For example, a number of organizations have adopted the “nightly build and smoke test” methodology to verify their ongoing development and integration activities. It is worth investing in that test to make it fully automated and to run fast enough to complete after each build. An example of effectiveness of the smoke test would be to implement and measure code coverage as part of the smoke test. Metrics To measure productivity, you have to know what you want. Table 1 shows the possible quality perceptions over product life. If you are at the start of market penetration, then time to ship is critical for your customers perceptions of quality (and for your monetary success).
|Product Life/ Market Pressure||Introduction||Early Adopters||Mainstream||Near Obsolescence|
Table 1: Quality Perceptions over Product Lifetime This paper assumes that time to market is most critical, that the software must be “good enough” to use, and that there is enough time to get most of the features into the release. If you are more feature-bound, or more defect-bound, you will need to think about what makes you most efficient and effective. Table 2 has a possible list of metrics for products who have features as their primary goal. The metrics provide some answers to the timeliness and ease of performing the requisite product development activities.
|Changes in requirements over time: Changed requirements Additional requirements Removed requirements||When you want to ensure that the development team is building what the customer wants, and knows what that is. If the feature requirements change a lot over time, especially near the end of a project, you may want to review your requirements process.|
|Product size planned vs. actual (over time)||Does the size follow the typical “S” curve of product development?|
|Test size planned vs. actual (over time)||Did you understand what you had to test?|
|Complexity planned vs. actual (over time)||If product complexity is either higher or lower than you expected, then you may not have understood the original feature requirements.|
|Time to implement and test each feature||Did you understand the feature requirements, and the time required to implement and test it?|
Table 2: Features as the primary goal Table 3 has a possible list of metrics for products who have very low defects as their primary goal.
|Running count of defects found and fixed during each development phase||To reduce overall defects, you need to know where they are injected, and where they are fixed.|
|Rate of injected defects from fixes (recidivism rate)||How good are your fixes?|
|Count of defects found per inspection Count of defects found per unit test Count of defects found per system test||How good are your defect detection and prevention activities: inspection, unit test, system test? Are you finding most of the defects in inspection and unit test?|
|Number of tests planned, run, passed Estimated vs. actual time to complete system test run||Are defects preventing you from running the system tests, or from running them in a reasonable amount of time?|
Table 3: Low defects as the primary goal
Schedule minimization metrics
Minimizing all schedule time is the first priority for products with time to market pressures. There are a number of areas that can be measured, as seen in Figure 4. Once you have decided on your goals for minimizing project time, you can start measuring things to give you not only an historical perspective, but a way to predict what will happen in the future.
|People entry onto the project, planned and actual||To minimize total schedule time, people must be assigned to the project when they are needed. This simple graph can really explain how a project can never make up the time missed in the beginning. See Figures 4 and 5.|
|Time between first build and first time regression tests pass.||If the previous regression tests do not pass, there are two possible problems: either the product has changed so radically that the tests need to be revamped, or the the product is now broken in fundamental ways. In the first case, the time between start|
|Elapsed time between project start and test development start Phase estimates and actuals for each product phase (requirements, architecture, development, integration, system test…)||If the test development effort does not start on time, the total project time will be increased by at least the amount the test development effort is late. The more concurrent the integration phase with the development phase, the shorter the overall schedule|
|Number of defects found per inspection Number of defects found per unit test Number of defects found per smoke test Number of defects found per test run||The earlier defects are found, the easier and faster they are to fix. If you also estimate the total number of detectable defects per activity, you can estimate the number of defects left. You can then make a business decision about when to fix the defect|
|Development rework rate Product complexity New defects as a result of incorrect fixes||The rate at which the developers rework the product, and introduce new complexity is an indication that the product is becoming less under their control. The bad-fix problem can account for up to 50% of all defects .|
Test rework Test creation rates after the planned test development time
|The effect of reworking tests or increasing the number of tests planned may increase integration or system test time, and may be an indication of inadequate planning or requirements management.|
|Planned vs. actual time by phase||The current accuracy of the schedule is a predictor of future accuracy of the schedule. See Figures 6a and 6b.|
|Time spent by the people on the project vs. other things||If people are not spending their time on the project, they cannot be focused on meeting the deadlines. See Figure 7.|
Table 4: Early Ship as the primary goal (schedule minimization)
If you look at Figure 4 by itself, it does not look “too bad” as one senior manager described it. However, once the manager saw Figure 5, he recognized the cumulative effects of delaying people's entry onto the project. He understood why the time could not be made up in the last three months of the project.
Figure 4: Planned and actual people onto the project
Figure 5: Effect of late entry for people coming onto the project Figures 6a and 6b represent the same data. They are just different ways of charting the data. They should both raise the same questions to you:
- Why were the Release 3.1 schedule estimates so far off?
- What happened in Release 3.3 to make such a dramatic improvement?
- What happened to Release 3.2?
- Why were the differences still greater than 0 days in each phase?
With the data in Figures 6a or 6b, you can investigate these questions, and test the possible answers. Without the data, you can't even ask the questions.
Figures 6a and 6b use actual data from a small division of a large company. The division was acquired at about the time Release 3.0 shipped, about 12 months before Release 3.1 shipped. The product is a middleware communications product. The code was not as complex as an OS, and is about 90,000 SLOC. The software has a very clean overall architecture, but as customers requested more features, the new features were considered to be small enough to not require architectural planning. That worked for a while, but during development of Release 3, the current implementation of the architecture hit its limits. As the developers fixed more defects, they uncovered more defects. Eventually, the group chose to ship the Release 3 of the product, even though none of the ship criteria were met. That had the effect of pushing the defect fixing and architectural activities into the next release. The development group assumed they would have time to do it “right” in Release 3.1, now that they were acquired by a large company with deep pockets. In reality, the group was wrong. The parent company expected its divisions to develop a schedule and live up to it. Although the developers understood their market, and the product, they did not understand how critical schedules were to the parent company. Since they had pushed a number of problems into the follow-on release, and they were not talented at scheduling, they were even less able to make their dates for Release 3.1. They shipped Release 3.1 because they did not believe the product was not ready, even though the product met none of the ship criteria. By now, they were so late with their releases, they chose to forget Release 3.2, and put all the 3.2 features into Release 3.3. They thought they had most of the hard problems behind them, they were fully staffed, and they kept believing their schedules. However, after the first milestone, a project plan, was missed, the parent company insisted they get project management assistance. The new project manager decided to find out if these people were not competent, or not focused on the work. The measure used was % of project time vs. % customer time. See Figure 7.
Figure 7: % time spent on the project vs. spent on customer problems. As suspected, the people were not able to focus on the project work because they were still putting out fires from the previous releases (see up to week 12). After customer negotiation, the engineers were able to increase the amount of time they spent on the project, and reduce the fire fighting time. However, the firefighting was real, and needed to be understood. The project manager decided to schedule the defect fixing work, and track the number of defects opened and closed every week, to verify that the ship criteria could be met when the date was met. See Figure 8.
Figure 8: Defect data by week to verify the ship criteria would be met when the date was met A number of things happened in Release 3.3. When the engineers realized the project manager was tracking schedule actuals and defects, they reduced work on non-measured activities. This was critical to the success of the project. They also chose to use the available tools (defect-tracking system, SCM tool) to help them with their development activities. In addition, they changed their processes for defect-tracking, code walkthroughs, and estimating schedules.
Notice that all measurements are aggregate measurements- there are no individual measurements anywhere. This is critical to the success of measurements- the measurements have to be meaningful to the people and aggregate to the group. Since software engineering and quality activities are a team effort, all productivity measures must be aggregate to the team. People will work on what is measured. As a manager or project manager, you need to be sure of what you want to measure- to choose what is critical to the success of the project or the team. Once you start measuring efficiency (timeliness) and effectiveness (use of resources) for a given organization's projects, you can draw appropriate conclusions and figure out what to do next.
- Abdel-Hamid, Tarek and S Madnick. Software Project Dynamics: An Integrated Approach. Prentice Hall, Englewood Cliffs, NJ, 1991.
- Grady, Robert. Practical Software Metrics for Project Management and Process Improvement, Prentice Hall, Englewood Cliffs, NJ, 1992.
- Humphrey, Watts. Managing the Software Process, Addison-Wesley, Reading MA, 1989.
- Humphrey, Watts. “What if your life depended on Software?”, Presentation at Boston SPIN meeting, 3/25/97.
- Moore, Geoffrey. Crossing the Chasm, Harper Collins, New York, 1991.
- Weinberg, Gerald. Quality Software Management: Volume 2, First Order Measurement, Dorset House Publishing, New York, 1993.
- Rothman, Johanna. “Case Study: From Chaos to Repeatable in Four Months”, SEPG '97, San Jose, CA, Mar. 1997.
Like this article? See the other articles. Or, look at my workshops, so you can see how to use advice like this where you work.