Chapter 19 - Managing the Acceptance Process
The previous chapter introduced the individual test lifecycle and the practices the assessors use for identifying test conditions, designing the tests, executing the tests and assessing the outcomes, and maintaining the tests. This chapter introduces the
management practices we use while executing the tests and while making the acceptance decision.
The process of accepting software involves many activities that generate the data we use as input into the acceptance decision. Each of these activities takes time and consumes valuable human and other resources. A well managed acceptance process will use these
resources wisely while a poor managed process has the potential to waste these resources, delay the acceptance decision and even compromise the quality of the decision made.
While we are executing our test plans, we want to know the answers to two important and inter-related questions:
- First, how is testing progressing? When will we have a high-confidence assessment of the product quality? Will it be in time to make our readiness or acceptance decision?
- Second, what is our current assessment of the product quality? Is it good enough to release whether to customers or to the acceptance testers? What actions do we need to take to make it good enough?
Test Scheduling Practices
The overall schedule that defines which tests will be run and when should have already been defined as part of the test plan. That includes the strategic decisions about whether all testing is done during a final test phase or incrementally throughout the project.
In this section we focus on the techniques we use for scheduling of test execution. As a reminder, under test execution we include both dynamic testing when test cases are run that involve executing software in the system-under-test and static testing that
is performed in the form of reviews or inspections without actually running code in the system-under-test.
Plan-Driven Test Scheduling
The traditional approach to test scheduling involves defining a period of time, sometimes called a test cycle, in which one complete round of testing can be completed. The test plan defines how many test cycles are planned. The details of what happens in each
test cycle often emerge as the project unfolds. One approach is to define a detailed project plan for the first test cycle, one which defines the set of testing activities that will be done on a day by day basis. Project management tools such as Pert or Gantt
charts may be used to document this detailed test execution plan. (See Plan-Driven Test Management.)
Session-Based Test Scheduling
The alternative to defining a detailed project plan of test activities is to schedule a series of test sessions and create a backlog (prioritized list) of test session charters that are executed in these test sessions. Each session would be of a standard duration,
say 90 minutes, and most charters should be executable within one session. This Session-Based Test Management is most typically used for managing exploratory testing but could also be used for execution of scripted testing. When each test session is completed
the tester marks the test charter as one of completed
or needs one or more additional sessions
, or suggests additional test charters for future test session. This provides a good indication of how testing is progressing as illustrated in figure
K (Test Charter Burndown) which shows the number of test charters dropping over time but with the occasional upwards spike as new test charters are identified.
Figure K Test Charter Burndown
In this chart, the bars representing the original charters show the charters that were defined before test execution began. The bars labelled Added Charters stacked on top of them represent new charters that were identified while test execution was occurring.
The Total Left line indicates how many test charters remain to be fullfiled while the Charters Completed indicates the number of charters already fulfilled. We can get a good idea at any point during test execution of when testing will be completed by projecting
the Total Left line out to the right until it intersects zero charters. That is the earliest probably completion date.
Self-Organized Test Scheduling
A third approach is often used by agile or self-organizing teams. All the testing tasks are posted on a wall chart, variously called a “big visible chart” [JeffriesOnBigVisibleCharts] or an “information radiator” [CokburnOnAgileDevelopment], and team members
sign up to do specific testing tasks. As they finish one task, they mark it done and pick another task to perform. This is called Self-Organized Test Scheduling.
Event-Triggered Test Scheduling
Some forms of automated testing can be set up to run automatically at certain times (such as every night at 2am) or when certain events occur (such as when someone checks in a change to part of the code base.) The results can be posted on a web site, automatically
e-mailed/IM’d to all team members, or communicated by a multi-colored light display or “lava lamps” in the team workspace or by an icon in everyone’s computer’s system tray or side gadget (for one such tool see Team Build Monitor [LambOnTeamBuildMonitor]).
This practice is often called Continuous Integration
for short. Teams that use continuous integration typically adopt a “stop the line” mentality to failing tests. The goal is to promote “code health” early. Whenever a test failure is reported by the automated test harness, fixing the failed test becomes
the top priority for everyone on the team . This is the software equivalent of the “stop the line” practice used in lean manufacturing. Once the failure is fixed, everyone can go back to working on their respective tasks. This focus on keeping the product
code ”healthy” improves quality and and reduces the duration of the acceptance test cycle.
Another approach is to prevent changes that break the build from being checked-in. The
feature of the Team Foundation Server defers a developer’s check-in until it can be merged and validated by an automated build. Passing automated tests or validating results of the static analysis could are examples of the validation policies.
For more information on Team Foundation Build, see [MSDNOnTeamFoundationBuild].
Test Progress Reporting Practices
When we first start executing the tests we don’t know whether the quality is high or low but we do know that we don’t have a high degree of confidence yet. As testing progresses, we should be getting a better idea of what the quality level is and how much longer
it will take us to get to the required level of confidence. This is very similar to the cone of uncertainty concept that predicts the completion date and/or cost of developing the software. Figure X illustrates the cone of uncertainty for quality for a typical
project. The initial guestimate was anywhere between 10 and 100 person-months. As the requirements were better understood, the range reduced somewhat but a sudden discovery of additional scope raised the estimates once again. An effort was made to descope
the project to recover the original timelines. Work creep slowly raised both the upper and lower limit and the estimates finally converge when the product is accepted as-is.
Figure X Cone of Uncertainty for Quality
We can, of course, assume that we will have a reasonably accurate assessment of quality when we have finished all our planned testing. This assumption may or may not be correct because it depends on how effectively our planned test activities will find all
the important bugs. This also implies a clear understanding of what is important to the product owner. This is very difficult to assess ahead of time. We certainly need to monitor how much testing work remains to be executed in the current test cycle and when
we expect to have it all completed. This applies to each test cycle we execute.
Session-based testing introduces a feedback mechanism that explicitly allows us to adjust the test plans as we learn new information both about the product and about the product owner and the product owner’s needs. As each test session is completed the testers
add any newly identified areas of concern to the backlog of test charters yet to be executed. Developers may also suggest additional test charters to address the risk associated with the modifications they make to the software as a result of change requests
and bug fixes identified by tests. We monitor the size of the backlog of test charters to get a sense of whether we are making headway. As the confidence of the testers and developers improves they will suggest fewer and fewer new test charters and therefore
the size of the charter backlog would drop faster.
Predicting when the software will be of good enough quality to deliver is difficult because that involves predicting how many test cycles we will require and how much time the product development team will require between the test cycles to fix the bugs. (See
Chapter 16 – Planning for Acceptance for a more detailed discussion on this topic.) This requires monitoring the bug backlog and the arrival rate of new bugs to assess and adjust the predicted delivery date.
Assessing Test Effectiveness
One of the challenges of testing is assessing how effective our tests really are so that we can know how confident we should be in our assessment of the quality we have. There are several ways to calculate the effectiveness of the tests including the use of
coverage metrics and using the find rate of intentionally seeded bugs.
We can use metrics like test coverage and code coverage to calculate the theoretical effectiveness of our tests. These metrics can tell us what percentage of the requirements has been tested and what percentage of the code has been executed by the tests but
neither of these is a direct measure of what percentage of the bugs we have found. Of course, the primary issue is that we have no idea of how many bugs really exist so it is pretty difficult to say with any certainty what percentage we have found. For a cautious
approach to using code coverage, see Brian Marick’s essay [MarickOnCodeCoverageMisuse].
One technique the can provide a more direct measure is defect seeding. It involves deliberately placing a known set of defects in the software. As bugs are found during testing we can estimate the percentage of defects found by dividing the number of seeded
defects found by the total number of seeded defects as shown in Figure Y.
Figure Y Percentage of Bugs Found
We can project the number of defects yet to be discovered using the formula in Figure Z.
Figure Z Total Bugs Calculation
These calculations are described in more detail in the Test Status Reporting thumbnail.
Bug Management and Concern Resolution
A key output of testing and reviews is a list of known gaps between what the product owner expects of the product and how it actually behaves. While the progress of testing is usually reported in terms of which tests have been run and which haven’t, the gap
between expectations and reality is usually expressed as a list of known bugs or issues that may or may not have to be addressed before the product will be accepted. In Chapter 16 – Planning for Acceptance, we introduced the Concern Resolution Model which
describes how concerns, including software bugs, change requests, issues and documentation bugs, can be managed. Concerns that fall into any of these categories could be considered gating (also known as blocking or blockers), that is, would prevent the system
from being accepted. Once we have finished our first full pass of testing, presumably at the end of our first test cycle, we can think of the set of gating bugs as being the outstanding work list for the product development team. Our goal is to drive the list
of gating bugs down to zero so that we can consider accepting and releasing the product. (Note that just because it reaches zero does not imply that there are no gating bugs, just none we currently know about.) The product owner can influence the gating count
in two ways: the product owner can classify newly found bugs as gating or they can reclassify existing gating bugs as non-gating if they decide that the bug can be tolerated because there is a low enough likelihood (reduced probability as described in Chapter
5 – Risk Model) it will be encountered or there are acceptable work-arounds in the event that it is encountered (reduced impact per the risk model.)
It is common practice to classify the severity of bugs based on their business impact. Usually a Severity 1 (or Sev 1 as it is commonly abbreviated) bug is a complete show stopper while a Sev 5 bug is merely cosmetic and won’t impact the ability of users to
use the product effectively. Many product owners insist that all Sev 1 & 2 bugs be fixed before they will accept the product. Some product owners consider Sev 3 bugs critical enough to insist they are resolved before the product can be accepted. Note that
the interpretation of the severity scale is merely a vocabulary for communication of the importance of bugs between the product owner and the product development team; it is entirely up to the product owner to make the decision whether the bug needs to be
fixed. The product development team may influence that decision by pointing out similarities or difference with other bugs that had the same severity rating or by pointing out the potential impact as part of an argument to increase or decrease the severity.
They might also point out potential workarounds or partial fixes that they feel might justify reducing the severity. But ultimately, it is the product owner decisions as to whether the product is acceptable with the bug still present.
The product owner needs to be reasonable about what bugs should be classified as gating. If there are 1,000 bugs and the team can fix 20 bugs per week, it will take the team 50 weeks to fix all the existing bugs assuming that no new bugs are found and no regression
bugs are introduced by the bug fixes. Both these assumptions are highly optimistic. Therefore, the customer needs to ask ”Which of these bugs truly need to be fixed before I can accept the product?” This is purely a business decision because every bug has
a business impact. Some may have an infinitesimally small business impact while others may have a large business impact. The product owner needs to be opinionated about this and to be prepared to live with the consequences of their decision whether that is
to delay acceptance, deployment and usage of the product until a particular bug is fixed or to put the software into use with the known bug present. There is no point in insisting that all bugs must be fixed before acceptance simply to be able to say that
there are no known bugs. Doing so will likely delay accruing any benefits of the system unnecessarily. The process of deciding the severity of each bug and whether or not it should ever be fixed is sometimes called Bug Triage.
Bug Backlog Analysis
This list of known bugs is a collection of useful knowledge about the state of the product. Most bug management tools include a set of standard reports that can be used to better understand the bug backlog. These include:
Figure D Bug Arrival Rate
Figure E Cumulative Bugs Found.
Figure Z Bug Correlation Report
Figure W Bug Debt/Burn Down Report
- Bug Fix Rate Report – Describes the rate at which bugs are being fixed.
- Bug Arrival Rate Report – Describes the rate at which new bugs are being reported. A decreasing arrival rate may be an indication that the return on investment of testing has reached the point of diminishing returns. Or it could just mean that less testing
is being done or that the testing is repeating itself. Long-lived products that release on a regular cycle typically find that the shape of the curve is fairly constant from release to release and this can be used to predict the ready-to-deliver date quite
accurately. See Figure D – Bug Arrival Rate and Figure E – Cumulative Bugs Found.
- Bug Debt or Bug Burn Down Report - Describes the rate at which the number of bugs is being reduced or is increasing. The burn down report aggregates the bug arrival rate and the bug fixing rate to allow us to predict when the number of gating bugs will
be low enough to allow accepting the product and releasing it to users.
- Bug Aging Report – Classifies bugs by how long it has taken to fix them and how long it has been since unresolved bugs were first reported. The latter will give an indication of potential level of customer dissatisfaction if the average age of bugs is large.
- Bug Correlation Report – Classifies bugs based on their relationship with attributes of the system under-test. The product owner is typically most interested in which features have the most bugs because this helps them understand the business impact of
accepting the system without resolving them. The product development team, on the other hand, is typically interested in which components, subsystems, or development teams have the most bugs associated with them because this helps them understand where their
own internal processes need to be improved most. Figure Z is an example of a Bugs by Area or Bugs by Feature Team report.
Test Asset Management Practices
The artefacts produced while planning and executing the assessment process are assets; these test assets need to be managed. If repeatability of testing is considered important, such as when the same tests need to be used for regression testing subsequent releases,
test scripts and the corresponding test data sets need to be stored in a version-controlled test asset repository. This may be a document repository such as SharePoint or a code repository such as the Team Foundation Server repository, Subversion or CVS. This
is described in more detail in the Test Asset Management thumbnail.
When test assets are expected to be long lived such as when the same tests will be used to regression test several releases of a product, it is important to have a strategy for the evolving the test assets to ensure maintainability. The Test Evolution, Refactoring
and Maintenance thumbnail describes how we can keep our test assets from degrading over time as the product they verify evolves.
Acceptance Process Improvement
The acceptance process consumes a significant portion of a project’s resources. Therefore, it is a good place to look when trying to reduce costs and improve the effectiveness of one’s processes. Two good candidates for process improvement are improving the
effectiveness of the test practices and streamlining the acceptance process to reduce the elapsed time. The latter is the subject of the next chapter.
Improving Test Effectiveness
Multi-release projects and long-lived products will go through the acceptance process many times in their lifetime. Each release can benefit from lessons learned in the previous release, if we care to apply the lessons. It is worth conducting one or more retrospectives
after each release to better understand which readiness assessment and acceptance testing activities had the most impact on product quality and which had the least. The highly effective activities should be continued in future releases and the ineffective
ones should be replaced with something else. Some teams make it a point to try at least one new practice each release to see if it will detect bugs that previously slipped through the readiness assessment and acceptance testing safety nets. It is also worth
analysing the list of defect found in the usage phase to identify any shortcomings in the safety net. Bug correlation reports can come in handy in this exercise. See Figure Q – Bugs by How Found Report.
Figure Q Bugs by How Found Report.
Like any process, execution of the acceptance process needs to be managed. This includes monitoring the execution of the planned testing and ensuring it results in data that allows for a high-confidence acceptance decision. A key aspect of managing acceptance
is the managing the bug backlog to get the best possible quality product at the earliest possible time. These two goals are at odds with each other and deciding which takes precedence should be a business decision. When building product is an ongoing goal
of the product owner organization, continuous improvement of the acceptance process should also be managed to further reduce time to usage and improve product quality.
The acceptance process typically takes up a significant portion of a project’s resources and elapsed time. The next chapter looks at ways both elapsed time and resource usage can be reduced.
[MartinFowlerOnContinuousIntegration] Fowler, M. Continuous Integration.
[CockburnOnAgileDevelopment]Cockburn, Alistair “Agile Software Development: The Cooperative Game
” Addison Wesley Professional
[JeffiresOnBigVisibleCharts] Ron Jeffries, Big Visible Charts,
[MarickOnCodeCoverageMisuse] Brian Marick, “How to Misuse Code Coverage“