Chapter 17 - Planning for Acceptance
Part I introduced several models to help us think about the role of acceptance testing in the overall context of the software development lifecycle. It also introduced the abstract roles of the people involved in making the acceptance and readiness decisions.
This chapter introduces practices used by one or more of those roles to plan the activities that will provide the data on which the acceptance decision maker(s) can base their decision.
Acceptance is an important part of the life cycle of a product; it is important enough that it should be the result of a carefully thought through process. The test plan is the end result of all this thinking. Like most such documents, it can serve an important
role in communicating the plans for testing, but the real value lies in the thinking that went into producing it.
Test planning builds on the work done during project chartering, which defines the initial project scope. In test planning, you we define the scope of the testing that will be done, select the test strategy, and drill down to detailed testing plans that define
who will do what, when, and where.
Most projects prepare a test plan that lays out the following, among other things:
- The scope of the acceptance process and the breakdown into readiness assessment and acceptance testing phases
- The overall test strategy including both manual and automated testing
- The activities, testing and otherwise, that will be performed in each phase (the readiness phase and the acceptance phase)
- The skills that are required to perform the activities
- The other resources that will be utilized to carry out the activities (such as facilities and equipment)
- The timeframe and sequence, if relevant, in which each activity will be performed
Defining Test Objectives
We need to have a clear understanding of the scope of the project and the software before we start thinking about how we will accept the software. Most organizations have some kind of project chartering activity that defines the product vision or scope. It
may also include a risk assessment activity. One form of risk assessment activity involves brainstorming all the potentially negative events that could cause grief for the project. Some of the commonly occurring risks that we may address through testing are:
- The Product Development Team misinterpreted the Product Owners description resulting in a product that behaves differently from expected.
- The Product Owners description of the product left out important behaviors resulting in missing functionality.
- The Product Development Team implemented the behaviors poorly resulting in frequent crashes of the product.
- The product exhibits the right behavior when used by small numbers of users but behaves erratically or responds slowly when the number of users or transactions increases to expected levels.
For each possible event we classify each of the likelihood and the impact as low, medium, or high. Anything ranked medium/high or high/high needs to be addressed. Some risks may cause us to change the way we plan our project, but other risks may cause us to
take on specific test planning activities. Together, the vision/scope and risk assessment help drive the test strategy definition and test planning.
Traditional Approaches to Testing
Traditionally, most products are tested from the users’ prespective. That is, once the product is finished the product’s behavior is verified via whatever interfaces the users will typically use. Often, very little if any testing is done of the parts of the
system before it is assembled into the final product. This results in a distribution of tests that looks like Figure A – Inverted Test Pyramid.
Figure A Inverted Test Pyramid
This figure shows a high-level view of how various kinds of tests (and consequently, test effort) are typically distributed (based on the level of test granularity – from unit to component/subsystem to system/whole product) when testing is done primarily by
testers using the same interfaces as end users. System testing
is done at the level of the whole product and can include a range of system sizes: individual systems or integrated workflows through multiple systems that have to interact. Similarly,
are used for testing packages, individual executables or subsystems; while the granularity of
can be on little things like methods/functions or bigger things like modules and classes. See the sidebard Testing Terminology Pitfalls for a discussion about what we call various kinds of tests.
The problem with the approach presented in Figure A (where most effort is focused on system testing and very little on unit testing) is that it is hard to verify that the software behaves in all possible circumstances because many of the circumstances are triggered
inside the software. Even when this is not the case, much of logic within the system is cumbersome to verify via the end-user interface. This makes verifying the behavior and finding any bugs very expensive and time consuming. Most of the behavior can be verified
and many of the bugs can be found much more cheaply through appropriate unit testing, component testing and subcutaneous system testing. These sorts of tests can and should be automated as will be discussed in section 17.2.31Automated Testing1below1
Sidebar: Testing Terminology Pitfalls
When preparing Figure A (and Figure 4 later in this chapter) we struggled with what to call the tests in the top layer of the upside down pyramid. Some of the alternatives we considered were: Acceptance Tests, Whole
Product Tests, Customer Tests, Functional Tests, and Integration Tests. While these names are in common use to describe the tests that occupy the top layer, we weren’t happy with any of these names because they didn’t clearly convey the difference between
the top layer and the layers below it. The problem with each of these historical names is that they often exist to help differentiate one kind of tests from another kind. For example, Functional Tests as opposed to what? The obvious answer is Non-functional
Tests, but we feel non-functional tests also fit into the top layer of the (inverted) pyramid. If not Acceptance Tests then what? The logical alternative is not Rejection Tests but Regression Tests!! Acceptance tests helpe us verify new functionalty works
while regression tests verify that exisiting functionality still works. The acceptance tests for this release are likely added to the regression test suite for the next release. So the terms acceptance and regression refer to points of differentiation on the
Examining the potential names with this critical lens helped us eliminate many potential names leaving us only with those names that helped us focus on whether we were testing the fully assembled product or its constituent parts. This led us to Whole Product
Test, System Test and Integration Test. Unfortunately, the term Integration Test can be used at various levels depending on whether we are integrating units, components, subsystems or systems (into a system of systems.) Whole product is probably the clearest
name but many teams think of themselves as building solutios to business problems rather than products for sale. This led us to focus on System Tests and the contrast with Component Tests and Unit Tests.
There are similar issues in the terminology around how tests are prepared and executed (Exploratory vs Scripted testing) and how tests are automated (Hand-scripted or programmed vs. record/playback, scripted vs. data-driven vs. keyword driven, etc.) The moral
of this sidebar is that we need to be careful to make sure we understand what someone means when they use one of these words to describe testing. They may well mean something entirely different from what we would have meant had we used the same word.
Flipping the Pyramid Right Side Up
The upside down test pyramid can be turned right side up by moving much of the testing activity to the unit and component level. But what kinds of testing can be push down?
The agile software development community has shown that it is possible to produce consistently high-quality software without significantly increasing the effort by integrating testing throughout the development life cycle. This has led to a rethinking of the
role of testing (the activity) and of test teams.
Brian Marick, has defined a model [MarickOnTestingDimensions] (see Figure 1 ) that helps us understand the purpose behind the different kinds of tests we could execute.
Figure 1 Purpose of Tests
The diagram in Figure 1 classifies various types of testing we can do along two key dimensions:
- Whether the tests are business-facing or technology-facing
- Whether the tests are intended to support development (by helping them get it right) or to
critique the product after it is built
Tests That Support Development
Tests can support development by helping the Product Development Team understand what the product is supposed to do before they build it. These are the tests that can be prepared in advance and run as we build the product. As part of the readiness assessment,
the Product Development Team can run these tests to self-assess whether the product implements the necessary functionality.
The tests in this column fall into two categories: the business facing tests that describe what the product should do in terms understandable by the business or Product Owner and the technology-facing tests that describe how the software should work beneath
The business facing tests that drive development are the tests (also known as customer tests). These tests elaborate on the requirements and the very act of writing these tests can expose missing or ambiguous requirements. When we (Product Owner, often together
with the Product Development Team) prepare these tests before development starts, we can be sure that the Product Development Team understands what they need to build. This is known as Acceptance Test–Driven Development.
If we prepare the system acceptance tests after development is complete or we prepare them in parallel with development and do not share them (this is referred to as "independent verification"), the tests do not help us build the right product; instead,
the tests act as an alternative interpretation of the requirements. If they fail when we finally run them, we need to determine which interpretation of the requirements is more accurate: the one implemented by the development team in the code base or the one
implemented in the functional tests by the test team. If the latter, time will be consumed while the development team reworks the code to satisfy this interpretation, rework that could have been avoided if we had shared the tests. Either way, we have set up
an adversarial relationship between development and testing. It is highly preferable to prepare the tests before the software is built so that testing can help development understand what needs to be built rather than simply criticize what they have built.
These tests may be run manually or they may be automated. The latter allows the Product Development Team to run them throughout the development cycle to ensure that all specified functionality is correctly implemented. The Product Owner will want to run additional
acceptance tests to make the final acceptance decision, but supplying a set of tests to the Product Development Team early so they can drive development goes a long way toward building the correct product. This is much more likely to happen when the tests
are easy and cheap to run—and that requires automated execution (for more information, see the Automated Functional Test Execution thumbnail). These tests may be implemented as programmatic tests, but they are more typically implemented using Keyword-driven
There are many tests used by product development that are not business-facing. Developers may prepare unit tests to verify that the code they wrote has successfully achieved the design intent. This is how they determine that they correctly built the code (as
opposed to building the correct product). Test-driven development (TDD) is when developers implement automated unit tests before they build the code the tests verify. This development process has been shown to significantly improve the quality of the software
in several ways including better software structure, reduced software complexity, and fewer defects found during acceptance testing [JeffriesMelnikOnTDDEffectiveness]. These tests are ever more frequently automated using members of the xUnit testing framework.
For more information, see [MeszarosOnUnitTestPatterns]
Tests That Critique the Product
Assuming that the product has implemented the correct functionality, we need to know whether the product meets the non-functional requirements. These tests support the acceptance decision. We do this by assessing the non-functional attributes of the product
after it has been (at least partially) built. These tests critique the product instead of driving the development process. They tell us whether it is good enough from a non-functional perspective. We can divide these tests into the two categories: business-facing
Technology facing tests that critique the product measure how well the product meets technically-oriented quality attributes (scalability, availability, self-consistency, etc.)
These tests provide metrics we can use when deciding whether the product is ready to be shipped. In most cases, these tests will be run as part of readiness assessment because of their technical nature. However, a Product Owner charged with deciding whether
to accept a product may be interested in seeing the results and comparing them with the minimum requirement. They may even hire a third-party test lab to conduct the testing on their behalf. For more information, see the Test Outsourcing thumbnail.
If system acceptance tests are used to drive development to build the product per the requirements, how do we make sure we are building the right product? The business facing tests that critique the product fulfill this role. These tests assess the product
(either as built or as proposed) for “fitness for purpose”. Usability tests are examples of tests that critique the product from a business perspective
Typically, these tests cannot be automated because they are highly subjective and some even require us to observe people trying to use the product to achieve their goals.
What Testing Will We Do? And Why?
Now that we have been introduced to a way of reasoning about the kinds of tests, we can decide the types of tests we need to run and which tests to automate. This is an overall test strategy that helps us determine how to best address our testing needs at the
Defining the test strategy may be considered to be part of the test planning process or a distinct activity. Either way, the purpose of defining a test strategy is to make some high-level decisions about what kinds of testing need to be done and how they will
be executed. Test strategy is tightly linked to the test objectives. For example, you would have a different test strategy if your objectives are to verify and accept functionality as opposed to verify compliance.
One of the key decisions is what kinds of tests should be automated and which approach to testing should be used for manual tests. The goal of these decisions is to try to minimize project risk while also minimizing the time and effort spent testing the software.
Previous sections introduced the concepts of functional and non-functional requirements. As part of the test strategy we need to decide where to focus. Clearly, testing cannot prove that software works correctly; it can only prove that it does not work correctly.
Therefore, we could spend an infinite amount of time testing and still not prove the software is perfect. The test strategy is about maximizing the return on investment (ROI) of testing by identifying the testing activities that will mitigate the risks most
effectively. This implies that some requirements may be tested less thoroughly, by choice.
We also need to decide whether we will do all the acceptance testing at the end of the project (in a separate testing phase, or in a test-last or “big bang testing” style) or incrementally as functionality becomes available. The latter approach, incremental
acceptance requires changes in how the project is planned and how the software is developed to ensure a continuous stream of functionality is delivered starting fairly early in the project. This is the sytle that we strongly advocate since big-bang testing
carries too many risks and is avoidable in many contexts. The payback of incremental acceptance is early discovery of misunderstood and missed requirements, thereby allowing time for remediation off the critical path of the project.
Another strategic decision may relate to test oracles; our source of truth
. How will we define what a correct outcome looks like? Is there a comparable system that we can use as an oracle? (For more information, see Comparable System Test Oracle.) Can
we hand-craft expected results? (For more information, see Hand-Crafted Test Oracle.) Or will we need to use a Human Test Oracle? If so, what can we do from a design-for-testability perspective to reduce the dependency on human test oracles?
Manual Testing Freedom
For functional testing, the key strategy decisions relate to how we will execute the tests. When we manually execute the tests, we need to decide how much freedom to grant the testers. Figure 2 contains the "freedom" scale used by Jon and James Bach
to describe the choices we have.
Figure 2 "Freedom" scale for testers (Jon Bach, James Bach – used with permission)
At one extreme of the test freedom scale, there is freestyle exploratory testing
, in which the testers can test whatever they think is important. At the other end of the scale is
, in which testers attempt to follow a well-defined test script. In between, there is
chartered exploratory testing
, which has charters of varying degrees of freedom including scenarios, user roles/personas, and charters. Scripted testing involves having a expert prepare test scripts to be executed much later by someone elseor a computer
when automated). here is little opportunity for test design during test execution.
Exploratory testing is an effective and efficient approach to testing that leverages the intelligence of the tester to maximize the bugs found in a fixed amount of time. Unlike with scripted testing, testers are encouraged to conceive new things to try while
they are executing tests. Some people describe it as “concurrent test case design and execution with an emphasis on learning”.
For more information on exploratory testing, see [KanerOnExploratoryTesting] and [BachOnExploratoryTesting].
Automated testing covers a wide range of topics: from automated execution to automated test case generation. Some kinds of non-functional tests require automated execution because of the nature of the testing being performed. A commonly overlooked area for
automation is the use of "power tools" while performing manual testing. Tools can also be used to generate test data or to do data inspection. The various uses of test automation need to be determined on a project-by-project basis. For more information
about this process, see the Planning Test Automation thumbnail of this guide and [FewsterGrahamOnSoftwareTestAutomation].
Maximizing Automation ROI
An effective test automation strategy strives to maximize the ROI of the investment in automation. Therefore, the tests we automate should cost less, at least in the long run, than we would have spent manually executing the comparable tests. Some tests are
so expensive to automate that we will never recoup the investment. These tests should be run manually.
To ensure that we get the best possible ROI for our test automation investment, we need to focus our energies on the following:
- Tests that have to be automated by their very nature
- Tests that are inherently easier to execute using a computer than a human
- Tests that need to be run many times
- Tasks (not tests) that can make manual (or automated) testing faster and more effective
Automated Execution of Functional Tests
Automated functional test execution is a powerful way to get rapid feedback on the quality of the software we produce. When used correctly, it can actually prevent defects from being built into the product; when used incorrectly, it can rapidly turn into a
black hole into which time and effort are sucked. When automated regression tests are run frequently, such as before every code check-in, they can prevent new defects from being inserted into the product during enhancement or maintenance activities. Ensuring
the Product Development Team understands the acceptance tests ahead of time can ensure the Product Development Team builds the correct product the first time instead of as a result of
test and fix
cycles. For information about how this works, see the Acceptance Test Driven Development thumbnail in Volume II.
A common strategy on projects that have an extensive suite of automated tests is to run these tests first as a form of regression test as the first activity in a test cycle; this is a form of extended smoke test. This ensures that the software functions properly
(to the extent of the automated test coverage) before a human tester spends any time doing manual testing.
The key to effective automated functional testing is to use an appropriate tool for each type of test—one size does not fit all. The two most common approaches to automated test preparation are test recording (see the Recorded Test thumbnail) and test scripting.
Recorded tests are easy to produce, but they are often hard to maintain. Scripted tests can either be programmatic test automation, which involves technical people writing code to test the code, or keyword-driven test automation, which non-technical people
can use to write tests using a much more constrained testing vocabulary. Because keyword-driven tests are typically written in the ubiquitous language defined for the product, they are also much easier to understand than most programmatic tests. Whatever approach
we use, we should think beyond the initial test authoring and also consider the life cycle costs of the tests. Recorded test tools do have some valuable uses. They can be used to quickly record throwaway test suites to support the development team while they
refactor testability into the product under test. They can also be used in a record and refactor style as a way of quickly building up a collection of keywords or test utility methods to be used in keyword-driven tests or programmatic tests, respectively.
Keyword-driven testing involves specifying test scripts in a non-programmatic style. The steps of the test are data interpreted by a keyword language interpreter. Another style of data-driven test automation is the reuse of a test script with multiple data
sets. This is particularly effective when we can generate test data, including inputs and expected outputs, using a comparable system test oracle. Then we run the data-driven test one time for each set of inputs/outputs. Commercial recorded test tools typically
provide support for this style of testing and often include minimal support for refactoring of the recorded test scripts into parameterized scripts by replacing the constant values from the recorded test with variables or placeholders to be replaced by values
from the data file.
Test Automation Pyramid
The test automation pyramid is a good way to visualize the impact of different approaches to test automation. When test automation is an afterthought, the best we can usually do is to use graphical user interface (GUI)–based test automation tools to drive the
product under test. This results in a distribution of tests as shown in Figure A – Inverted Test Pyramid.
Frequently, these tests are very difficult to automate and very sensitive to any changes in the application. Because they run through the GUI, they also tend to take a long time to execute. So if test automation is an afterthought, we end up with a large number
of slow, fragile tests.
An important principle when automating tests is to use the simplest possible interface to access the logic we want to verify. Projects that use test-driven development techniques attack this problem at multiple levels. They do detailed unit testing of individual
methods and classes. They do automated testing of larger-grained components to verify that the individual units were integrated properly. They augment this with system tests (which include use case or workflow tests) at the whole product level. At each higher
level, they try to focus on testing those things that could not be tested at the lower levels. This leaves them with much fewer system tests to automate. This is illustrated in Figure 4 – Proper Test Pyramid
Figure 4 Proper Test Pyramid
One way to reduce the effort involved is to minimize the overlap between the unit tests and component tests with the system tests. A specific example of this is the use of business unit tests to test business logic without having to go through the user interface.
Another technique is the use of subcutaneous
(literally, “under the skin” meaning behind the GUI) workflow tests to test business workflows without being forced to access the functionality through the user interface. Both of these approaches require
the product to be designed for testability
Automated Testing of Non-functional Requirements
Many types of non-functional tests require the use of automated test tools. Many of these tools are specially crafted for the specific purpose of assessing the product with respect to a particular kind of non-functional requirement. Common examples include
performance testing tools that generate load to see how the product copes with high transaction rates or usability testing tools that record test sessions and analyze data on users’ performance (e.g. successful task completion, time on task, navigation paths,
error rates etc.)
Automation as Power Tools for Manual Testers
Automated tests provide a high degree of repeatability. This works very effectively as a change detector, but it probably will not find bugs that have always been there. For that we need human testers who are continually looking for ways to break the software.
For human testers to be effective, they must be able to focus on the creative task of dreaming up and executing new test scenarios, not the mundane tasks of setting up test environments, comparing output files, or generating or cleansing large amounts of test
data. A lot of these tasks can be made fairly painless through appropriate use of automation.
We can use automated scripts and power tools for the following:
- To set up test environments
- To generate test data
- To compare actual output files or databases with test oracles
- To generate and analyze test reports
- To tear down test environments
If we need a large dataset, we can write a program to generate one with known characteristics. If we need to test how particular transactions behave when the product is stressed, we can write a program that uses up all the memory or disk-space or CPU on command.
These are all examples of power tools that make human testers more effective. For specific examples see a series of Visual Studio Team Test walkthroughs [SterlingOnTestTools].
Automated Test Generation
One of the grand objectives of software testing is automated test generation. The industry is still some distance from being able to push one button and have a tool generate and run all the tests we will ever need, but there are some selected situations where
automated test generation is practical. One example is combinatorial test optimization. For example, if we have a module we are testing that takes five different parameters, each of which could be any one of four values, each of the values causes the module
to behave somewhat differently, but in different way. To test this effectively, we would have to test 1024 (44
4) different combinations, which is not very practical. We can use a tool that analyses the five dimensions and generates a minimal
set of five-value tuples that will verify each interaction of a particular pair of values at least once (see all pairs test generation and stressful input test generation in [BachOnTestTools] includes all).
Another form of test generation is Model-Based Testing (MBT). In MBT, we build a model of the salient aspects of the product-under-test and use it to derive tests. The tests can be derived manually or we can build a test generation tool that analyses the model
and generates the tests need to achieve full test coverage. The tool can simply generate test scripts that will be executed by other tools or human testers, or it can execute each test as it is generated.
Readiness vs Acceptance
As described Chapter 1 - The Acceptance Process -- and Chapter 2 - Decision Making Model , the acceptance of software can be divided, at least logically, into two separate decisions. The readiness decision is made by the Product Development Team before giving
the software to the Product Owner who makes the acceptance decision. A key decision is determining which tests are run as part of readiness assessment and which are run as part of acceptance testing. In most cases, the readiness assessment is much more extensive
than the acceptance testing. When functional tests are automated, it is likely they run in readiness assessment, and the software is not released to acceptance testing until all the tests pass. This results in a better quality product being presented to the
Product Owner for acceptance testing.
Managing Test Development & Automation
It is useful to have a common understanding of how the development of tests relates to the development of the product. In sequential processes, the development of the tests starts well after product development and doesn’t influence it. We prefer to think of
test development as occurring in parallel with product development specifically so that it can influence what is built. It is also useful to think about how the tests are defined and how that relates to test automation. If automation is focused on entire test
cases, the automation cannot begin until at least some of the test cases are fully defined.
An alternative is to think of test automation at the test step level. Then automation development can begin as soon as some of the steps required by test cases are defined. This also allows the test automation development to be carried out by people with different
skills (test automation) than those doing the test design (testing). The automation work can be carried out in parallel with the test definition and the product development. This makes it possible to have automated tests available as soon as the product software
is available rather than at some later time.
The test steps or actions should be defined using the Ubiquitous Language adopted by everyone in the project. The test scripts defined using these terms can be executed by human testers or if the ROI is sufficient, automated. If all the steps needed by a test
script have automated implementations, the entire test script can be executed automatically. If only some of the steps are automated, a human tester may still be needed but the time required to execute the test may be reduced significantly.
Who Will Accept the Product?
Ultimately, the acceptance decision belongs to the Product Owner. In some cases, the Product Owner may not be a single person. In these cases, we may have a Product Owner Committee that makes the acceptance decision using some sort of democratic or consensus-based
process. Or, we may have a set of acceptance decision makers that each can veto acceptance (see Chapter 1 - The Acceptance Process in Part I) In other cases, the Product Owner may be unavailable. In these cases, we may need a Product Owner proxy to act as
the "goal donor" who both provides requirements and makes the acceptance decision. The proxy may be either a delegate selected by the Product Owner, a mediator between a group of customers, or a surrogate who acts on behalf of a large group of anonymous
customers. The latter role is often referred to as the product manager. For more information about this process, see the Customer Proxy Selection thumbnail in Volume II.
A related question is who will do the acceptance testing? And by extension, who will do the readiness assessment. This very much depends on the business model and the capabilities and skill set of the parties involved. The sidebar "Decision-Making Model
Stereotypes" enumerates several common scenarios. When either the Product Development Team or the Product Owner feels they need assistance conducting the readiness assessment or acceptance testing, they may resort to a Test Outsourcing model. The third-party
test lab would do the assessment, but the readiness decision and acceptance decision still belong to the Product Development Team and Product Owner respectively.
When Will We Do the Testing?
The test plan needs to address when the testing will be done. Some testing activities will be done by the Product Development Team as part of readiness assessment, while others are the responsibility of the Product Owner who will be deciding whether to accept
the product. The test plan needs to include this in further detail to the point where we have an understanding of how much time we need for readiness assessment and acceptance testing and what we will use that time for. There are two main strategies for when
testing is done on a project. The sequential approach to product development lifecycle places most testing activities into a separate test phase after all development is completed. The agile approach is to test each increment of functionality as it is completed.
One way to plan testing is to define a testing phase of the project after all new functionality development is complete. This is illustrated in Figure 5 Testing Phase in a Sequential Product Development Lifecycle.
Figure 5 Testing Phase in a Sequential Product Development Lifecycle
This testing phase typically consists of several time-boxed test cycles, each of which contains both readiness assessment and acceptance testing activities as illustrated in Figure 6.
Figure 6 Multiple Test Cycles in Accept Phase of Project
Each test cycle would be focused on one release candidate (RC) version of the software. Within the test cycle the Product Development Team makes a readiness decision before involving the Product Owner Team in the acceptance testing. In the early test cycles,
this readiness decision is made by answering the question, "Is it good enough to bother having the Product Owner Team test it?" It is valuable to get Product Owner feedback on the software even if we know there are some defects. The test cycles each
result in concerns that need to be investigated. Any concerns that need software changes are then addressed by the Product Development Team, and a new release candidate is built. This sets the stage for the next test cycle. We repeat this process until the
release candidate is accepted by the acceptance decision maker(s).
Within each test cycle, it is likely we will have a predefined set of testing activities; these may be laid out as a Pert chart or a Gantt chart to reflect timing and interdependencies. We may decide to reuse the same plan for each of the test cycles, or we
could define a unique plan for each cycle. In practice, with good automated regression testing in place, we should find fewer and fewer defects each test cycle. Because of this, a risk-based approach to planning the subsequent cycles could result in shorter
cycle times and faster time to market. We may also deliberately choose to defer some kinds of testing activities to later test cycles or to do them in earlier test cycles.
The test resources on a sequential project are typically involved mostly at the back end of the project. Even when they are involved in fleshing out the requirements at the beginning of the project, their involvement while the software is being built tends
to be minimal. The staffing profile of a hypothetical 10 week project is illustrated in Figure 7 – Sequential Test Specialist Staffing Profile.
Figure 7 Sequential Test Specialist Staffing Profile
This project involves ten features and each feature requires one person week of requirements, one person week of test design and one person week of test execution followed by a person week of bug fixing by development and one person week of regression testing.
Note how the bulk of the activity occurs at the back end of the project and requires ten test specialists for weeks 8 and 10 (week 9 involves primarily development resources.) This bursty nature of testing is usually accommodated by testers splitting their
time across several projects. This makes it hard for them to keep up to date on what is being built and how the requirements are evolving because multiplexing (task-switching) is a form of waste. (See the sidebar on Multiplexing vs. Undivided Attention). Also,
this requires development to be finished in week 7 to leave time for two 1-week test cycles with a week of bug-fixing in between.
Sidebar: Mutliplexing vs. Undivided Attention
Having the bulk of testing activity done at the end of the product development life cycle gives the managers the illusion that testers experience slack in their workloads at different times and therefore can be assigned to multiple projects that would enable
them to use the time more efficiently and effectively. Not only does this approach place constraints on when developers receive feedback on the product, but also on how much time they have to act on that feedback.
Multiplexing requires context switiching. The cost of context switching includes the effort for understanding each project’s status and re-immersing onself in the other project context as well as finding proper test environments, resetting them, relearning
certain requirements of the product, re-establishing contact with the right people, and often regaining focus and rethinking issues that have already been settled. All of these are referred to by Tom DeMarco as
[DeMarcoOnSlack]. Multiplexing may also require catching up on the work done in one’s absence.
Gerald Weinberg estimates that the context-switching cost of assigning an additional project to one person is a massive 20% [WeinbergOnContextSwitching]. For three projects, it’s a 40% drop in productivity!
Joel Spolsky’s testimony indicates that with software engineers the cost of context switching even greater – 75% [SpolskyOnContextSwitching]. Spolsky argues that software development is “the kind of intellectual activity where we have to keep a lot of things
in our head at once. The more [information] you [can] keep in mind, the more productive you are”. Such information includes variable names, APIs, data structures, requirements, helper functions, test cases, details of stack traces, debugging information, the
repository structure, configuration parameters and where they are set.
Researchers in cognitive psychology and organizational behaviour who specifically studied software engineers also came to the same conclusion: multiplexing is harmful. It’s one of the contributing factors to “time famine”, which is destructive to individuals’
lives and not in the best interest of their employers either [PerlowOnTimeFamine] and [OcasioOnWorkFragmentation].
Ooops… the piece of code above clearly doesn’t belong here. It crawled in from a different project. When noticed it, I realized what a travesty this was!! I was working on a demo of the fluent configuration interface for for the Enterprise Library 5.0 project
while also working on the Acceptance Test Engineering Guide. This was such a wonderful illustration of another point – how the bugs can easily crawl in while multiplexing and not giving the full attention to one project – that I’ve decided to leave this in
here as a case in point!The bottom line is this: full-time involvement in a single project improves individual performance.
The alternative to leaving all the testing to the end of a project is to do incremental testing as software is developed. This requires that the Product Development Team organizes its work such that there is a steady stream of testable software available starting
early in the project. This is illustrated in Figure 8 – Incremental Product Development Lifecycle.
Figure 8 Testing in an Incremental Product Development Lifecycle
This incremental product development lifecycle allows acceptance testing to be spread out over most of the project duration instead of being jammed into a much shorter testing phase on the critical path near the end of the project. (This is rather different
than the sequential approach used in many organizations; the sidebar What it Takes to do Incremental Acceptance describes some of the challenges and solutions.) As each feature or iteration is finished, it is assessed for readiness immediately by the Product
Development Team and turned over to the Product Owner for acceptance testing. Ideally, any bugs found are fixed immediately as described in the sidebar
Incremental Acceptance and Bug Tracking on Agile Projects
in Chapter 8 rather than being stockpiled for the end of the project.
If the same ten features described in the sequential example are built by an agile team using Incremental Acceptance Testing, the testing staff profile looks more like Figure 9 – Agile Test Specialist Staffing Profile.
Figure 9 Agile Test Specialist Staffing Profile
On this project we start out by specifying two features in the first week, design the tests for those two features in the second week as well as specifying a third feature. In the third week we test the first feature and do test design for the third feature.
Note how the staffing ramps up to four person weeks of work and stays there for the rest of the duration. Towards the end as new feature testing ramps down the slack is taken up by regression testing of the early features. Because the testers are part of the
development team they are able to keep abreast of how the requirements and product design are evolving. Since the same people will be designing and executing the tests the need to write detailed test scripts are reduced and more effort can be spent on exploratory
testing and test automation. Since the testing is done soon after the code is developed the developers can fix the bugs immediately so there is no separate “testing and bug fix phase” at the end therefore development can continue closer to the delivery milestone.
Session-Based Test Management
An alternative to using a plan-driven approach within the test cycle is to use a more iterative style known as Session-Based Test Management. We create a prioritized backlog of testing activities that we address in a series of test sessions. As new concerns
are identified in test sessions, we may add additional test activities to the test backlog. The key is to keep the backlog prioritized by the value of the testing. This value is typically based on the expected degree of risk reduction. The depth of the backlog
gives us an idea of how much testing work we have left and if we are making headway by addressing concerns or if are losing ground ( the backlog increasing in depth). Session-Based Test Management is commonly used exploratory testing.
Where Will We Do the Testing?
The test plan needs to identify where the testing will be performed. When all the testing will be performed in-house, the primary consideration is which physical (or virtual) environments will be used. This is particularly important when new environments need
to be created or shared environments need to be booked. If we lack physical resources or the skills to do the testing, we may choose to do test outsourcing to a third-party test lab.
We also need to define the criteria for moving the software between the environments. The transition from the readiness assessment environment to the acceptance environment is governed by the readiness assessment criteria. When developers have their own individual
development environments, we also need criteria for when software can be submitted into the team's integration environment where readiness assessment will occur. These criteria are often referred to as the Done-Done Checklist because the definition of
"done" is more stringent than what a developer typically refers to as done.
How Long Will the Testing Take?
Because testing is usually on the critical path to delivery of software-intensive-products, project sponsors and project managers usually want to know how long testing will take. The most common answer is "How much time do we have?" Frequently, the
time available is not long enough to gather enough data to make a high confidence readiness or acceptance decision. This answer is not quite as flippant as it sounds because of the nature of testing. We cannot prove software works correctly—we can only disprove
it by finding bugs. Even if we had an infinite amount of time to test, after a point, diminishing returns will kick in and we will not find many more defects. So, we have to determine what is barely sufficient to get enough confidence about the quality level.
This requires at least a minimal level of test estimation to establish the lower bound of the time and effort we need to expend.
Some tests may have dependencies the restrict when they can be run. These could be dependencies on other test environments such as when doing integration testing with other systems or they may be dependencies on date.
We also need to know how long we’ll need to wait for any defects we need addressed to be fixed by the Product Development Team. Will there be “dead” time between test cycles while we wait for fixed software as illustrated in Figure Y – Alternating Test &
Figure Y Alternating Test & Fix
The dead periods occur if the acceptance decision is the first point at which the Product Development Team is provided the list of “must fix” bugs after which the Product Development Team starts the process of characterizing and fixing the bugs. This may take
days or weeks to develop a new release candidate which must then undergo readiness assessment before being presented to the Product Owner for the next round of acceptance testing.
The duration of the acceptance phase (consisting of several test cycles) can be reduced significantly if the Product Development Team is notified of bugs as soon as they are found. This allows them to be fixing defects continuously and delivering a new release
candidate on a regular schedule as shown in Figure X – Continuous Test & Fix.
Figure X Continuous Test & Fix
This depends on the Product Owner (or proxy) doing bug triage on all newly found concerns so that the Product Development Team is made aware of all gating bugs (bugs that must be fixed before accepting the software) as soon as practicable.
If we are planning to automate tests, we should have an effort estimate for the automation. In most cases, we want separate estimates for the construction of the automation infrastructure and the preparation of the tests because of the different skills and
knowledge needed to do the two jobs. For more information, see the Test Automation Planning thumbnail.
How Will We Determine Our Test Effectiveness?
A learning organization is one that is constantly striving to improve how it works. This involves understanding how well we are doing today and trying new approaches to see whether they make us more effective. Measuring effectiveness requires Test Metrics.
These metrics measure two key areas of performance:
- They measure how far along are we in executing our test plan. That is, how much work is left before we know enough to make the readiness or acceptance decision? For more information, see the Test Status Reporting thumbnail.
- They measure the effectiveness of our testing. For more information, see the Assessing Test Effectiveness thumbnail.
How Will We Manage Concerns?
The purpose of testing is to identify any concerns with the software of those who matter. Many of these concerns will require changes to the software either because something was incorrectly implemented (a bug) or because the Product Owner realized that what
they had requested will not satisfy the business need (an enhancement or change request). The test plan needs to lay out how these concerns will be managed and tracked; it also needs to include the process for deciding what needs to be changed and what is
acceptable as is.
As we conduct the various readiness assessment and acceptance activities, we note any concerns that come up. Investigating these concerns more closely reveals that, each concern falls into one of the following categories:
- Bug or defect
- Requirements change
- Project issue
Figure 3 illustrates the lifecycle for each of the major types of concerns.
Figure 3 Concern Resolution Model
Part of the advantage of categorizing these concerns is to help identify how they will be addressed. Each category of concerns has its own resolution process and each must be addressed differently.
Bugs or Defects
Bugs are defects found in the software that require a software change. The bugs need to be understood well enough to make decisions about what to do about them. A common process for doing this is known as Bug Triage, which divides the bugs into three categories
with respect to the next milestone or release: Must Fix, Would Like to Fix (if we have time), and Will Not Fix. Of course, the software must be retested after the fixes are made, which is why there are typically multiple test cycles. Fixes may also need to
be propagated into other branches of the software. If the bug is found to exist in previous versions that are currently in use, a patch may need to be prepared if the bug is serious enough. Security-related bugs are an example of bugs that are typically patched
back into all previous versions still in use. Likewise, a bug fixed in the acceptance test branch may need to be propagated into the current development branch if new development has already started in a separate development branch.
The Product Owner may have realized that even though the Product Development Team delivered what the Product Owner asked for, it will not provide the expected value. The Product Owner should have the right and responsibility for making the business decision
about whether to delay the release to make the change or continue with the less useful functionality. Once we’ve decided to include a change in this release we can treat it more or less the same as a bug from a tracking and retest perspective.
Some concerns that are exposed do not require changes to the software. They may be project issues that need to be tracked to resolution, additional things that should be tested, and so on. These typically do not get tracked in the bug management system because
most projects have other means for tracking them. Other concerns may be noted but deemed to not be concerns at all.
This chapter introduced the activities and practices involved in planning a testing effort. A key activity is the definition of a test strategy because this is what guides us as we strive to maximize the ROI of our efforts. There is a place for both automated
testing and manual testing on most projects because the two approaches are complementary. Automated functional tests are highly effective change detectors that go a long way toward preventing new bugs from being introduced during software maintenance activities.
They are also repeatable. Care has to be taken to use the appropriate functional test automation tools to avoid the slow, fragile tests quagmire. Exploratory testing manual testing can be effective and efficient ways of finding bugs, especially unusual ones,
those that are hard to automate, and those that stem from incomplete, misunderstood, or vague requirements. The use of power tools by human testers can significantly increase the effectiveness of manual testing.
In this chapter we have described the practices involved in planning for the acceptance of the product. In the next chapter we will examine the practices we use while executing the acceptance process.
[MarickOnTestingDimensions] Marick, Brian
[CrispinGregoryOnAgileTesting] Crispin, Lisa and Janet Gregory. Agile Testing,
Addison Wesley, 2008.
[MeszarosOnUnitTestPatterns] Meszaros, Gerard. xUnit Test Patterns: Refactoring Test Code
. Addison-Wesley Professional. 2007.
[KanerOnExploratoryTesting] Kaner, Cem, Jack Falk, and Hung Q. Nguyen. Testing Computer Software, 2nd Edition
. Wiley. 1999.
[BachOnExploratoryTesting] Bach, James. What is Exploratory Testing? http://www.satisfice.com/articles/et-article.pdf
[FewsterGrahamOnTestAutomation] Fewster, M., Graham, D. Software Test Automation
. Addison-Wesley, 1999.
[ItkonenRautiainenOnExploratoryTesting] Juha Itkonen and Kristian Rautiainen, “Exploratory Testing: A Multiple Case Study”, International Symposium on Empirical Software Engineering,” ISESE 2005, 17-18 Nov. 2005, pp. 84-94.
[ItkonenEtAlOnDefectDetection] Itkonen, J.; Mantyla, M.V.; Lassenius, C. “Defect Detection Efficiency: Test Case Based vs. Exploratory Testing”, in Proc. of First International Symposium on the Empirical Software Engineering and Measurement, ESEM 2007. 20-21
Sept. 2007, pp 61 – 70.
[JeffriesMelnikOnTDDEffectiveness] Jeffries, R. and Melnik, G., “TDD: The Art of Fearless Programming”, Guest Editors’ Introduction, IEEE Software Special Issue on Test-Driven Development, May-June 2007, pp.24-30.
[SterlingOnTestTools] Charles Sterling, “Visual Studio Team System 2010 Test Features walk through with screen shots.”
, October 20, 2009
[BachOnTestTools] James Bach. Repository of Test Tools.
, October 20, 2009
[WeinbergOnContextSwitching] Gerald Weinberg, Quality Sofwtare Management, Vol.1: Systems Thinking, New York, NY: Dorset House Publishing, 1992.
[SpolskyOnContextSwitching] Joel Spolsky. “Human Task Switches Considered Harmful”
, 2001, visited October 20, 2009
[DeMarcoOnSlack] Tom DeMarco, Slack, Broadway, 2001.
[OcasioOnWorkFragmentation]. William Ocasio. “Towards an attention-based view of the firm.” Strategic Management Journal, 18:187-206, 1997.
[PerlowOnTimeFamine] Leslie Perlow. “The time famine: Toward a sociology of work time”, Administrative Science Quaterly, 44(1):57-81, 1999.