Case Study: From "Chaos" to "Repeatable" in Four Months

Abstract

Recent writing in the software process improvement literature [1,2] discuss those organizations who start off at an ad hoc level (CMM level 1) and proceed to a repeatable level (CMM level 2) and higher. However, there are many organizations who cannot make progress towards level 2 until they have a firm grasp on level 1 – the knowledge that they perform processes as part of the their daily work. This is a case study of lessons learned from an organization’s journey from level “0”, a total lack of knowledge of process, through ad hoc, heroic processes, to a repeatable process.

Keywords: software process improvement, software engineering, software product development, program management, project management, concurrent development and testing.

Introduction to Case Study

The work described here was preceded by a short informal assessment of the project. Given the results of the short assessment, we chose not to conduct a formal assessment. We surmised that any assessment results would be rejected by the development staff, which would lead to a less effective working relationship. Since the purpose of this work was to develop and ship a product, not specifically to improve processes, it was critical to build a positive working relationship with the staff.

This paper is a case study of a small product development group working as an individual division within a large software company. The software process improvement approach here used the CMM as a reference, but not as a roadmap. This case study shows that it is possible to get good results by doing some simple things.

The product is a middleware communications application. For the sake of this discussion, we will call the product “Messenger”. At the beginning of this study, the project was in this state:

No project manager or project schedule
Servicing customer calls consumed about 75% of the development staff’s time (time consuming conference calls and reactive bug fixes)
No public configuration management or automated public bug-tracking system
No design reviews or public code reviews or walkthroughs
No unit tests and inadequate system tests
Too many unproductive meetings
Multiple missed corporate deadlines

The Messenger team was demoralized. The entrepreneurial founder was unsuccessful working in the larger organization that had acquired Messenger, and was subsequently fired. The Messenger management team was focused on sales and marketing issues, and could not fathom how to manage product development. Moreover, they were not especially interested in managing product development. With the original founder gone and without any software engineering process, the engineering team was in turmoil and the project was in a state of disarray.

Although Messenger was part of a larger company, the product and infrastructure remained distant from the parent company. Instead of reporting into an appropriate product organization, the Messenger group reported to the Chief Technical Officer. Originally, the Messenger management team wanted the separateness. However, without the benefits of seeing how to integrate themselves into the larger company, the Messenger technical staff was unable to take advantage of the engineering tools and processes, or even just the support infrastructure of the larger company.

Figure 1 shows the state of the technical work, without factoring in the morale issues. Since the reactive bug-fixing work took so much of the technical staff’s time, we focused on understanding the practices related to knowing about and fixing bugs. Some private design reviews and code walkthroughs were held, during which bugs were discovered. The manual bug tracking system did not help the staff to know about the bugs in the product needing to be fixed. The private configuration management system prevented people from easily fixing bugs wherever the bugs were. In addition, when the code is a “secret”, the rest of the staff does not know about bugs or the interactions of bugs. The more the staff knows about bugs, the easier it is to develop tests and estimate the time left to complete the tasks at hand.

Figure 1: Cause and effect diagram of Initial Project State

However, the Messenger technical staff did not know everything they needed to know about the bugs. For example, because the code reviews (really walkthroughs) were privately held, decisions were made for parts of the product that prevented other parts of the product from working. And, even when design reviews were held, not everyone who worked on the product was allowed to attend. In particular, the software test people were not allowed to attend, which had a negative effect on the test development effort.

The manual bug-tracking system was kept by noting the bugs in spreadsheets. The spreadsheets were private documents and did not allow the group to see where the bugs were grouped, nor to develop formal tests to verify that the bugs were truly fixed. The manual and private configuration management system was completely inadequate to find bugs. One person was responsible for merging in all the code into the appropriate code line. However, this person felt it was his job to “improve” code – by changing variable names, etc. And, there was no way to guarantee the code was merged into all the correct code lines. This private configuration management system had the detrimental effect of burying bugs, and there was no way to guarantee bugs were fixed in the different product versions. The total number of outstanding bugs created a vicious cycle of spending more time fixing bugs, followed by customers finding bugs and demanding immediate fixing. This cycle was preventing the release shipment, and keeping the developers busy fixing bugs.

Messenger staff, technical and management, seemed to be unaware that their actions had effects on the important stakeholders of the product. The developers had a “coder” mentality, and were not thinking enough about how to accomplish their work in all of its scope, and the effects of their actions, specifically on the testers. The testers did not think enough about developing tests that mimic the customers’ applications. All of the people, including the current management staff, were unaware that their individual work was a process, and was part of a larger process – that of product development.

Unfortunately, there is no silver bullet to educate people about process issues. In fact, we deliberately chose not to talk about repeatable processes. We needed to focus on getting the product shippable. We finessed the idea of talking about what we were going to do; instead we just did it. We focused on teamwork, and getting people to understand that if they did not do their work completely and correctly, their customers, the people to whom they handed off their work, would be forced return it for rework.

We realized that we could not get to a fully-functioning team just by saying so. So, we put together a concurrent plan of work, for both management and technical staff. We made certain everyone knew what their jobs were, how to execute those roles, and how to repeat the successes and learn from the failures.

The initial job roles were:

Project Leader: generate a weekly schedule. Ignore the fact that it was out of date within 2 hours. Determine project state by conducting hour-long one-on-one daily meetings with everyone on the development staff. Ignore the SQA staff
Architect: Design any component of the system, no matter how large or small. Choose design and code walkthrough attendees for any piece of design or code. Lead all walkthroughs.
Developer: Code what the architect defined. Perform initial testing on the coding. Submit code to “code czar” for integration.
Code Czar: Integrate all code. Maintain build system. Change the code if it wasn’t “good enough”.
SQA Engineer: Define and perform testing on delivered software. Create release masters. Fulfill orders.

These roles changed during the first steps of the project.

First Step: Role Definition via Project Planning

Schedule Development

The technical staff consisted of one program manager and project leader, one architect, seven developers, and two SQA engineers. The first step we took was to define and develop the schedule, project plan, and program plan, all roughly in tandem. The schedule allowed us to break down the work so that everyone knew their responsibilities, and with whom they worked. Within the tasks, the engineers defined who did what, so that the architect really was doing high level architecture, the developers were assisting with design, and implementing the design, and the SQA engineers developed tests to test the features. The original project leader lacked sufficient skills to work on the project planning, so was transitioned to work on Beta test preparations. The job roles now looked like this:

Program Manager/Project Manager: Generate a project schedule, and update it weekly. Determine project state via program and project team meetings, formal and informal weekly one-on-ones with engineering staff, of no more than 30 minutes duration. As problems arose, make them visible to the team, and facilitate finding solutions.
Architect: Design large components of system. Guide developers for smaller components.
Developer: Design, implement, and debug components. Run walkthroughs for each component then submit them to the “code czar”.
Code Czar: Integrate all code as written and submitted. Maintain the build system.
SQA Engineer: Define and perform testing on delivered software, taking into account the product architecture and customer use of the system.

This notion of people being responsible for their work was foreign to the team at the beginning of the project. The developers thought it was their job to write code, not to help with design issues. The SQA engineers were blindly implementing tests, without a real product knowledge to know if they were testing the right things. Schedule creation and distribution to the team, weekly one-on-one status meetings, and a weekly project team meeting gave people multiple opportunities to really understand what they were supposed to do, and how to do it.

Project Plan Development

In addition to the task list and schedule, an overall project plan was developed by the program manager in collaboration with the Messenger team. In addition to circulation among management, the project plan and all updates were distributed to all of the technical staff. In many cases, they had never seen a project plan before, so significant management effort was expended in educating people about why a project plan was both necessary and useful.

The project plan was also a useful vehicle for defining the product features in a way the Marketing Requirements Document (MRD) was not. The MRD defined not just the features, but the possible markets for the features, the selling strategy, and a number of other marketing-focused issues, not just the requirements for the product. This was defined by the Marketing people as required for their work. Instead of referring to the MRD in the project plan, we added the relevant sections to the project plan, and required a marketing sign-off on the project plan. This allowed the project plan to be the only working document and to be the official arbiter of the product features and performance.

Program Plan Development

There are a number of benefits to working in a large organization – many support processes are often well-defined. In this case, the software manufacturing, product introduction, advertising, and order fulfillment processes were well-defined and worked. So, in addition to the technical project work, we started working on making contacts around the organization, and formed a program team (cross-functional team). The program team served the dual purposes of informing the corporation of the progress in the Messenger group, and informing the Messenger group what the expectations of the corporation were. The program team had representatives from Manufacturing, Service, Training, Product Marketing, and the local Program Manager/Project Manager.

The project team was puzzled by the initiation of a program team. In the past, all decisions, no matter what the level, were made at project team meetings. So, the project team meetings were way too long. Some of the technical staff were concerned that they were not involved in the program team meetings, and even sending out minutes of every meeting did not allay their concerns for over a month. After that time, the technical staff became comfortable that the program team minutes did reflect the actual discussions.

Program plan development had a number of interesting side effects. For the first time, people’s roles and responsibilities (management and technical) were written down. The primary effect was that people started to really question their roles. For example, because Messenger was so remote from the rest of the corporation, all order fulfillment went through the SQA engineers. The SQA engineers created masters, even between releases, and duplicated media to send the product to customers. Although records were kept, none of the information went back into the customer database, so the Messenger group had to continue supporting the customers themselves.

Messenger staff had serious concerns about the parent company’s ability to perform the manufacturing and order fulfillment tasks. These concerns were rooted in Messenger’s staff ignorance of the parent’s ability to perform those tasks. However, since the technical staff was struggling just to perform the necessary product development tasks adequately, they relinquished the non-product development tasks, such as software manufacturing and order fulfillment.

Middle Steps: Process Definition

Process definition took about four weeks to get started after the schedule, project plan, and program plan efforts started. It was not process definition in the classical sense; it was a grass roots effort. During the weekly one-on-one meetings, the project manager discussed each technical person’s role, and what that person needed to be successful. The staff was gently encouraged to think about their problems in new ways, preferably from their stakeholder’s perspectives.

That discussion spawned a number of peer discussions, where the technical staff talked about what they needed from each other. The technical staff presented their needs, on an ongoing basis at the weekly project team meetings. The project manager and the technical staff used the team meetings to air issues and discuss resolutions. After the first two months of the project, we had a working process. Everyone knew what their tasks were, and how to perform them. Everyone knew what the project priorities were, and how to make decisions to satisfy the project needs.

We developed a concurrent development and testing process, which is depicted in Figure 2. The architect defined an overall architecture of the product, with the additional features. A subset of developers would start the design with the architect. After the first design, there was an informal design review with the entire development and SQA staff. Then the detailed design and implementation started, along with the SQA development of tests. By the time the code was written and developer-tested, everyone was ready for it. All code went through an informal walkthrough before being checked in. After the code was checked in, the SQA tests started on those features.

Figure 2: Concurrent Development and Testing Process

This process worked for all feature additions, and large bug fixes. Small bug fixes did not require architecture and design intervention, and so joined the process at around code walkthrough time.

We used a number of metrics to verify what we thought about the project: schedules were tracked against original estimates, requirements were managed, and bug metrics were tracked. We used the following scheduling technique:

Layout the entire project.
Understand the major milestones.
Develop multiple milestones for every person for every week.
Verify the individual milestone success via one-on-ones and project team meetings.

Using the above technique, it was obvious when a milestone was missed within two days. And we didn’t fall into the trap of thinking you can make up time in the schedule later. If individual milestones were not met, we replanned the work for the major milestone.

For this Messenger release, there was no flexibility in the schedule. If we missed milestones, people would either have to work harder or we would drop features. In fact, we did drop a feature to make the Beta criteria. And because there was no additional time to put the features in, it was quite easy to say “No!” to additional requirements.

Bug metrics tracked the open bugs, and the find and close rates by severity. Once the find rate started going down, and the close rate went up, especially for the higher severity bugs, we knew we were getting close to our Beta criteria.

The Beta (and final shipment) criteria were the objective criteria by which the release was judged, to verify the release was ready for that milestone (Beta or final shipment). See Appendix A for the criteria.

Ongoing Steps: Continuous Learning

After the team’s first successes (finding bugs before customers did, achieving milestones, fixing more than 100 outstanding bugs from the previous release), they became quite attached to the process. The technical staff, with little prompting from management, developed a tracking mechanism for the bug fixes in progress. The technical staff improved their ways of reporting schedule problems, and the weekly project team meetings took less time.

These may sound like small accomplishments, but represent a dramatic change in people’s thinking about their work. For the first time in this group, people recognized that their work was a process, or represented part of a larger process. This understanding was key to their ability to succeed in the organization, and for product development success. This understanding of processes allowed them to think about how to do work in general, not just what had to be done immediately.

In addition to the technical processes, there were improvements in the management processes. Until there was a defined process, it was difficult to know which of the technical staff were successful in product development, and who were not. Once the process was defined, and people started using it, there was opportunity for management to work with individuals whose work was not acceptable, and to reward people who did outstanding work.

Although the technical staff did not originally contribute to the Beta and final shipment criteria, they were adamant about making the Beta and final shipment criteria even stronger once they realized the value of the criteria. For example, we had a Beta readiness review, and realized that we had not met five of the eight criteria for Beta, and that the product was just not ready to release. We did not release the product for Beta. That shocked the technical staff, who then decided they would make the criteria theirs, and make the product good enough for Beta.

That feeling contributed to the continued use of the process for the milestones leading to final shipment. The technical staff owned the final ship technical criteria and they developed the individual milestones to achieve those criteria.

Summary

There was a tremendous amount of learning going on during this release. At the end of this study, the project was in this state:

A program and project manager, whose work could be repeated.
A working project schedule, with technical staff who understood how to generate another schedule.
Very few customer calls and demands for bug-fixing.
A public configuration management system.
An automated bug-tracking system and mechanism for fixing bugs.
Design reviews on the core parts of the system.
Public code walkthroughs on the core parts of the system.
Adequate system tests but no unit tests.
Short weekly project team and one-on-one individual meetings.

This is a lean and nimble process, which suited the culture well. The process was effective, and has some aspects of many level 2 organizations.

Process

The process is flexible, and can continue to evolve as necessary. The project team will remain small, so for the next year or so, we can expect evolutionary changes to the process. The particular process, a concurrent development and test process, met their needs of fast customer response, and quick product development cycle time.

People

The biggest change was in the people. The technical staff started off as “process cynics”, but quickly became converted to doing things in a defined, repeatable way. They drove a number of the process improvements, and were happy to see the process written down.

Planning

Planning was critically important. The effects of the plans made a number of issues obvious to the management and technical staff:

What the tradeoffs were and how they were made in the project.
Internal stakeholders knew what to expect and when.
The marketing and sales personnel knew what was safe to communicate to external stakeholders.

Success factors

This process implementation was successful for these reasons:

The staff was ready to work another way. They were exhausted from mandatory overtime.
No jargon was used. This change was clearly not a management fad.
Project and program management decisions were visible to the staff.
The technical staff owned the process evolution, not management.
The process work was part of their project work – not overhead they needed to consider in addition to their work.
There was a clear return on investment in the process adoption – the product was released on time, with all of the required features, and with very few bugs. For the first time in this group’s history, the customers raved about the product.

The technical staff were reluctant at first, but once they saw the value in process definition and evolution, the technical staff took the process on as their own. This group of people has a clear view of their goals, the process required to meet those goals, and the results of using that process.

We used the resources available to not only ship the product, but make long-term changes. We used knowledge and experience to rapidly design an improvement program for the culture. The process is effective and permanent, yet adaptable for the future.

References

Haley, Thomas. Software Process Improvement at Raytheon, IEEE Software, Nov. 1996.
Humphrey, Watts, Managing the Software Process, Addison-Wesley, Reading MA, 1989.
Paulk, Mark C., Bill Curtis, Mary Beth Chrissis, and Charles V. Weber. Capability Maturity Model for Software, Version 1.1, CMU/SEI 93-TR-24. Pittsburgh: Software Engineering Institute.
Weinberg, Gerald, Quality Software Management: Volume 2, First Order Measurement, Dorset House Publishing, New York, 1993.

Acknowledgements

We gratefully acknowledge all of the reviewers’ comments, and especially thank Brian Lawrence for his comments which greatly improved this paper.

Appendix A: Release Criteria

Beta Criteria used by the Messenger team:

All code must compile and build for all platforms: (platforms removed for confidentiality.)
All developer tests must pass.
All available tests for Beta customer (client side part of the product) must run and pass.
All current bugs are entered into the bug-tracking system.
First draft documentation and available, and shippable to customers.
The code is frozen.
Tech support training plan is in place, and people are in place.
There are less than 36 open priority 0 and 1 bugs. (Priority 0 bugs are the most severe bugs – those that prevent the customer from using the product. Priority 0 and 1 bugs were not well-differentiated, so the criteria referred to both of them.)

Product Shipment Criteria used by the Messenger team:

All code must compile and build for all platforms.
Zero priority 0 and 1 bugs.
For all open bugs, documentation in release notes with workarounds.
All planned SQA tests run, minimum 90% pass.
Number of open bugs decreasing for last three weeks.
All Beta site reports obtained and evaluation documented.
Reliability criterion: Simulate 1 week of usage by sending a minimum of 200 messages of varying sizes to and from varying platforms with varying classes of service.
Final draft doc available, complete and submitted to <corporate organization>.
A working demo runs on <previous release>.
Verify that tokens reduce on-air time by 25% from <previous release>
At least 2 referenceable Beta sites. (Customers who were sufficiently happy with the product that they would agree to be contacted by potential customers.)

Like this article? See the other articles. Or, look at my workshops, so you can see how to use advice like this where you work.

Case Study: From “Chaos” to “Repeatable” in Four Months

Abstract

Introduction to Case Study

First Step: Role Definition via Project Planning

Middle Steps: Process Definition

Ongoing Steps: Continuous Learning

Summary

Like this:

Related

Leave a ReplyCancel reply