Achieving a Repeatable Process

by Johanna Rothman. This article was originally published in Software Development Magazine, June 1998.

Large, permanent engineering cultural change–or process improvement–generally takes a long time to implement. However, key repeatable processes can free up time in an organization and let people spend more time on the creative work of software development. I recently worked with a software development organization that had trouble meeting any of their promised product ship dates– all releases were late. In addition to being late, the releases were often missing key pieces of functionality and had significant bugs.

The group I worked with on this project consisted of six developers, three testers, a part-time writer, a project manager, and a product manager. Overall, morale was low. Current management focused on sales, and was not good at, or especially interested in, managing product development. With the original founder gone and no software engineering process, the engineering team was in turmoil and the project was in a state of disarray.

The responsible vice president realized that he needed to take different action to finish the project. He called me in as a project and program manager. His directive to me was “Get the project to beta ASAP and ship the final version by July 1.” He didn’t care about process improvement or the culture, he just wanted the project done.

I first assessed where the staff was spending their time, to understand who did what. I discovered several problems at that point: no project manager or schedule; time-consuming customer calls; no public configuration management or automated public bug-tracking system; no formal review process; no unit tests; inadequate system tests; unproductive meetings; and many missed project deadlines.

The group had no repeatable processes in place. My goal was to help them create and define the processes that worked for them. I wanted to make their processes repeatable so the group could be successful without me, once I was done with this project. However, the group was made up of process skeptics–I had my work cut out for me. First, I worked on the most critical problems.

Step 1: Improve Bug Finding and Fixing Work

Reactive bug-fixing work, spurred by customer calls, took much of the technical staff’s time, so the team and I began our improvement activities by focusing on the practices related to finding and fixing bugs. Individual team members had been tracking bugs via e-mail and spreadsheets; there was no database used by the entire group. Eventually, they reluctantly agreed to use one.

The team also held some private design reviews and code walkthroughs were held, during which some bugs were discovered. However, these closed meetings resulted in detailed code that was a “secret” to the developers who didn't attend the reviews and walkthroughs. Decisions were made that broke other parts of the product because relevant developers could not attend the walkthroughs. And, since the test group did not attend the reviews and walkthroughs either, they could not develop tests effectively.

The exclusive reviews and walkthroughs had understandable origins. The original development team had a wonderful working relationship, and had worked on the product from the beginning. When the work got too big for the three of them, they hired more people without thinking about how to integrate new people into the group. They never provided an architectural overview, nor functional detailed of the system to the new developers. They never provided the new developers with an architectural overview or functional details of the system. The new developers were expected to figure out the system on their own. When the new developers were first informed about the reviews and walkthroughs, they attended, but had many questions. Their questions made the reviews and walkthroughs take longer than the original developers expected, so the newer team members were then excluded.

I began simultaneous work on reducing the bug fix time and making the reviews public. When code is a secret, the rest of the product development team does not know about the existence of bugs or their potential interactions. The more the entire staff knows about the status and nature of bugs, the easier it is to develop tests and estimate the time left to complete the tasks at hand.

Step 2: Configuration Management: Losing the Code Czar

One of the developers had been deemed the “code czar”, and he took this title quite seriously. When people developed code, they handed over the source to him. He then reviewed it, and decided whether to check it into the codeline as is, or to change it. If he didn’t like variable names, he changed them. If he didn’t like the way code was written, he changed it. Then he checked the code in, and built the system. He was the only person who was completely happy with this arrangement. The other developers either tolerated or hated the system, depending on how much the czar changed their code. The effect of the code czar was to privatize the configuration management system. This privatization prevented people from easily fixing bugs.

As a remedy, we transitioned to a public configuration management system (CMS), with plenty of discussion and emotional turmoil. The code czar pointedly told me that he “wanted to be in the middle of things”, that he wanted “the release to revolve around him.” I explained that he could not be the holder of the code, and still have the team meet its deadlines. An argument ensued, and he even threatened to quit. The following day, after we'd cooled down with the help of the architect, the code czar reluctantly agreed to install and use a commercial configuration management system. He was so interested in seeing the product become a commercial reality, as well as receiving public recognition for a successful project, that he was willing to give up his singular code ownership.

We used a commercial system that allows for private areas, multiple branches, and labels. The developers were concerned at having to learn a new tool so close to a release. Fortunately, the code czar took on the role of educating people about how to use the system, so the developers learned the basics within a day. It only took one week of using the public configuration management system to uncover a number of bugs that were preventing the team from getting to the beta milestone. They were thrilled with their progress when they were able to fix 10 to 12 bugs a day.

The Need for Process Definition

With those key areas taken care of, the team could begin defining its own key process. The original product development process was serial. First, the architect would figure out how something should work. Then, the developers would code it and have a private walkthrough. Next, the testers would test it; after a few iterations, the product would be shipped.

All the technical and management staff seemed to be unaware that their actions affected the critical stakeholders of the product. The stakeholders were the customers, senior management, and so on–anyone intimately involved with producing, selling, and supporting the product.

The developers had a “coder” mentality, and were not thinking enough about how to accomplish their work in all of its scope, or the effects of their actions, specifically on the testers. For example, they didn't consider the effects on the testers when they checked in code that wasn't specified in the requirements document. It’s not unusual for written requirements to spawn more requirements when the product is in development. If the testers don't know about the new functionality, they can’t test it. Subsequently, the technical support staff cannot answer questions about it, and the writers cannot document it. The developer's role is beyond just writing code, it also includes providing information about the process.

On top of this, the testers did not think enough about developing tests that mimic the customers’ applications. All of the people, including the current management staff, were unaware that their individual work was in fact a process, and was part of a larger process– that of product development.

I took a grassroots approach to process definition. During weekly one-on-one meetings, I discussed each technical person’s role, and what that person needed to do to be successful. I gently encouraged the staff to think about their problems in new ways, preferably from their various stakeholders’ perspectives.

That weekly discussion spawned a number of informal peer discussions, where the technical staff talked about what they needed from each other. The technical staff presented their needs, on an ongoing basis, at the weekly project team meetings. We used the team meetings to air issues and discuss resolutions. After the first two months of the project, we had a working process. Everyone knew what their tasks were, and how to perform them. This information was documented as part of the project team meeting minutes, and in e-mail to the group. Everyone knew what the project priorities were, and how to make decisions to satisfy the project needs.

Next, we developed a concurrent development and testing process, as shown in Figure 1. The architect defined an overall architecture of the product, specifying the additional features in this release. A subset of developers started the design with the architect. After the initial design, we had an informal design review with the entire development and software quality assurance staff. Then we started the detailed design and implementation, along with the software quality assurance development of system tests. By the time the code was written and developer-tested (unit tested), everyone was ready for the walkthrough and check in. All code went through an informal walkthrough before being checked in to the configuration management system. After the code was checked in, the software quality assurance tests started on those features. This process worked for all feature additions, as well as for large bug fixes. Small bug fixes, which did not require architecture and design intervention, joined the process around code walkthrough time.

Figure 1: Concurrent Development and Testing Process

We used metrics to verify what we thought about the project: schedules were tracked against original estimates, requirements were managed, and bug metrics were tracked. The scheduling techniques were t lay out the entire project, understand the major milestones, develop multiple milestones for every person each week, verify the individual milestone success via one-on-one and project team meetings, and to iteratively develop minor milestones.

With this technique, we could notice when a milestone slipped in as little as two days. We also didn’t fall into the trap of thinking we could make up time in the schedule later. If individual milestones were not met, we replanned the work for what was most important.

Since there was no flexibility in the schedule, if we missed milestones, people would either have to work harder or we would drop features. We had several absolute requirements for shipment, and worked on those pieces of the product first.

Bug metrics tracked the open bugs and the find and close rates by severity. Once the find rate started going down, and the close rate went up, especially for the higher severity bugs, we knew we were getting close to our beta criteria.

Ongoing Steps: Continuous Learning

After the team’s first successes (finding bugs before customers did, achieving milestones, fixing more than 100 outstanding bugs from the previous release), its members became quite attached to the process. The technical staff, with little prompting from management, developed a tracking mechanism for the bug fixes in progress. They improved how they reported schedule problems, and the weekly project team meetings took less time.

These may sound like small accomplishments, but they represent a dramatic change in the way people thought about their work. For the first time in this group, people recognized that their work was a process. This understanding was the key to their success within the organization and also for the success of the product. These accomplishments helped them think about how to do work in general, and not just focus on what had to be done immediately. This was the first step to making the product development process repeatable.

In addition to the technical processes, we made improvements in the management processes. Until there was a defined technical development process, it was difficult to know which of the technical staff were or were not successful in product development. Personnel reviews were given more as rewards than to discuss goals and work accomplished. The product development process let management understand product development and everyone’s contribution to it. That gave the non-technical management the opportunity to work with individuals whose work was not acceptable, and to reward people who did outstanding work.

Although the technical staff did not originally contribute to the beta and final shipment criteria, they established ownership of the criteria and they were adamant about it even stronger once they realized its value. For example, at the first beta readiness review we realized that we had not met five of the eight criteria for beta, and that the product was not ready to release. As project manager, I did not release the product for beta. That shocked the technical staff. I did not threaten them into meeting the criteria, I just drew a line the in the sand. They had never been challenged to provide what they said they would provide. In previous releases, they shipped the product when it wasn't “too bad” to use, instead of when it met stakeholders' expectations.

By refusing to ship the beta because our criteria had not been met, the technical staff became aware of the criteria's value. They decided to own the criteria, and prepare the product for beta. That feeling of ownership contributed to the continued use of the process for the milestones leading to final shipment. The technical staff owned the final ship technical criteria and developed the individual milestones to achieve those criteria.

The Bottom Line

A tremendous amount of learning occurred during this project. At the end, the team had a program and project manager, whose work could be repeated, a realistic project schedule, fewer customer calls, a public configuration management system, an automated bug-tracking system and mechanism for fixing bugs, public design reviews and code walkthroughs on core parts of the system, adequate system tests, and productive meetings. This lean and nimble process suited the culture well. It was effective, and had aspects of many CMM Level 2 organizations. The process was flexible, and could continue to evolve as necessary. The project team remained small, so for the next year or so, they could continue to make evolutionary changes to the process. The particular process, a concurrent development and test process, met their needs of fast customer response, and quick product development cycle time.

This process implementation succeeded for the following reasons:

The staff was ready to work another way. They were exhausted from mandatory overtime.
No jargon was used. This change was clearly not a management fad.
Project and program management decisions were visible to the staff.
The technical staff, not management, owned the process evolution.
The process work was part of their project work – not overhead they needed to consider in addition to their work.

There was a clear return on investment in the process adoption – the product was released on time, with all of its required features, and with very few bugs. For the first time in this group’s history, the customers raved about the product.

The technical staff were reluctant at first, but once they saw the value in process definition and evolution, the technical staff took the process on as their own. This group now has a clear view of its goals, the process required to meet those goals, and the results of using that process.

Evolution of a Chaotic Team toward Repeatable Processes

BEFORE:

No project manager or project schedule
Servicing customer calls consumed about 75% of the development staff’s time (time consuming conference calls and reactive bug fixes)
No public configuration management or automated public bug-tracking system
No design reviews, public code reviews, or walkthroughs
No unit tests or adequate system tests
Too many unproductive meetings
Many project deadlines missed

AFTER:

A program and project manager, whose work could be repeated
A realistic project schedule, with technical staff who understood how to generate another schedule
Very few customer calls and demands for bug-fixing
A public configuration management system
An automated bug-tracking system and mechanism for fixing bugs.
Public design reviews on the core parts of the system
Public code walkthroughs on the core parts of the system
Adequate system tests but no unit tests
Short weekly project team and one-on-one individual meetings between developers and the project manager

Like this article? See the other articles. Or, look at my workshops, so you can see how to use advice like this where you work.