This project is read-only.

Chapter 18 - Assessing Software

In the previous chapters we have introduced many of the concepts around how we plan the assessment of the product against the Minimum Marketable Functionality (MMF) and minimum quality requirement set (MQR) to which we have agreed. In this chapter we will introduce the various techniques that we use as we do the assessment including test conception, test design and test execution.

Assessment is a generic term we can use to describe the activities, testing or otherwise, that we use to evaluate the system-under-test. Some of these activities are focussed on preparing and executing tests while others may be review activities. Some of the activities are done as part of readiness assessment by the product development team while others may be done by or for the product owner under the banner of acceptance testing. Where the same practice is used in both forms of testing, the mechanics of the practices typically don’t change very much although the emphasis of how much each practice is done and the objectives of applying that practice might vary.
We start the discussion with an overview of the lifecycle of an individual tests, something that both functional and non-functional tests do share and then move into a discussion of the practices used in each state of the individual test lifecycle where we introduce the techniques that are unique to specific of kinds of requirements.

The Lifecycle of an Individual Test

Every single test, however simple or complex, whether manual or automated, goes through a number of stages during its lifetime. This lifecycle is illustrated in Figure 1.
Figure 1
Figure 1 Individual Test Lifecycle
The states of the lifecycle are:
Note that test conception and test authoring are often lumped together under the label of test design. Some forms of testing, often called static testing, involve inspecting artefacts that describe the system under test rather than running the actual code. The terminology used for these forms of testing is somewhat different (for example. they are often called reviews rather than tests) but for the purpose of discussing test lifecycle, we shall use the test terminology.
Let’s examine each of these states in a bit more detail.

Test Conception

At some point, someone decides that we need to verify one or more aspects of the system-under-test’s behavior; we call these “things to test” test conditions. At this time the test is just a figment of someone’s imagination. It starts its transition from an implicit requirement to one that is much more explicit when it gets written down or captured in a document. It might appear in a list of test conditions associated with a feature, requirement or user story. Typically, it will just be a phrase or name with no associated detail. Now that the test exists in concept, we can start moving it through its lifecycle.

Test Authoring / Test Design

Test authoring or test design is the transformation of the test or test condition from being just a named item on a list into concrete actions. It may also involve making decisions around how to organize test conditions into test cases which are the sequences of steps we execute to verify them. Note that test authoring/design may happen long before test execution or concurrently with test execution.

Test Scheduling

Once a test case has been identified and authored or a test charter defined, we must decide when it will be executed. The schedule may indicate the one time the test is run or the frequency with which it is run and the triggering mechanism. It may also identify who or what is executing the test and in which test environment(s).

Test Execution

Once authored and scheduled, we need to actually execute the tests. For dynamic tests this involves running the system-under-test; static tests involve inspecting various artefacts that describe the system-under-test but do not involved executing the code . Depending on the kind of test in question the test may be executed manually by a person, by an automated testing tool, or by a person using tools that improve tester productivity through automation.

Result Assessment

Depending on the tools involved, the pass/fail status of the tests may be determined as the tests are executed or there may be a separate step to determine the test results after the test execution has been completed. We determine whether a test passed or failed by inspecting the actual results observed and determining whether they are acceptable.

Test Reporting

Once a suite of tests has been executed and assessed, we can report on the test results. A good test report helps all the project stakeholders understand where the project stands relative to the release gate. Chapter 1 - The Acceptance Process provides details on what information might affect this decision. Test reporting includes both test status reporting to indicate how much test effort remains and test effectiveness reporting which describes our level of confidence in our tests.

Test Actioning

The purpose of executing tests is to learn about the quality of our product so that we can make intelligent decisions about whether it is ready for use or requires further development or testing. The Acceptance Process describes the process for deciding whether or not to accept the software but before we can make that decision we may need to fix some of the defects we have found. The Bug Triaging process is used to make the “Is it good enough?” decision by determining which bugs need to be fixed before we can release. (See the “Doneness Model” for more details.)

Test Maintenance

Some tests are only run once while others may need to be run many times over long periods of time. Tests that will be run more than once may require maintenance between runs as a result of changes to the parts of the system-under-test that they interact with (for example, the database state). These kinds of tests may warrant more of an upfront investment to ensure that they are repeatable and robust. Tests intended for manual execution will need to be updated whenever the parts of the system being tested undergo significant changes in functionality. Whereas human testers can usually work around minor changes, fully automated tests will typically be impacted by even the smallest changes (with tests that interact with the system-under-test through the GUI being the most fragile) and therefore may require significantly more frequent maintenance.

End of Life

Sooner or later a test may no longer be worth executing. Perhaps the functionality it verifies has been removed from the system-under-test or maybe we have determined that the functionality is covered sufficiently well by other tests so that we no longer get much additional value from running this test. At this point the test has reached its end of life and no longer warrants either execution or maintenance.

Variations in Test Lifecycle Traversal

Some tests spend a lot of time in each state of the test lifecycle while others may pass through the states very quickly. For example, in automated functional testing we might spends weeks preparing a complex test, wait several weeks before we can first execute it, and then run it several times a day for many years. In contrast, during a single one hour exploratory manual testing session, the tester may conceive of several test conditions, design a test to explore them, learn something about the system-under-test, conceive several more test conditions and design tests to explore them also, all in the space of a few minutes. The automated tests will spend most of their lifetime in the maintenance state while exploratory tests are very ethereal; there isn’t a concrete representation that needs to be maintained.

Highly Compressed Test Lifecycle - Exploratory Testing

Exploratory testing is summarized by Cem Kaner as “Simultaneous test design and execution with an emphasis on learning”. From this description it should be clear that there is no clear separation of the various stages of the test lifecycle when doing exploratory testing. The tester learns about the system-under-test by using it and forming hypothesis about how it should behave. Based on these hypotheses the tester conceives of one or more test conditions to which they might subject the system-under-test. They rapidly design, in their mind’s eye, a test case they could use to achieve this. They exercise the test case and observe the result thereby forming more hypotheses which in turn lead to more test conditions. There is not an attempt made to have the test persist beyond the test session unless it revealed a bug. This removes the need to document and maintain the tests (though a good exploratory tester keeps a set of notes/journal of key points and discoveries during his or her testing); the obvious consequence is that exploratory testing is not intended to be very repeatable.
This process is very lightweight with very little overhead getting in the way of the tester interacting with the software. This makes it possible for exploratory testers to formulate a lot of hypotheses, test them and find a lot of bugs in a very short period of time. The tester takes notes as they go focussed primarily on the following outputs:
  1. What functionality they have tested and their general impressions of what they have seen.
  2. What bugs they found and what they had done to cause them.
  3. What test conditions they had conceived that they were not able to get to. These may be used as the charter for a subsequent exploratory test session.
  4. How much time was spent actually testing versus how much was spent getting ready. This information is useful when deciding what kind of power tools would make the exploratory tester more efficient in the future.Despite its somewhat chaotic appearance exploratory testing can be quite disciplined and methodical even though it is not very repeatable. Exploratory testing can range from completely unstructured to highly disciplined. The more disciplined forms of exploratory testing use a sequence of time-boxed test sessions to structure the testing activities.
Planning of exploratory testing consists of defining an initial list of high-level test conditions (as in “kinds of things we should test”) for use as test session charters and deciding how many test sessions to budget for executing the charters. Examples of charters might include the following:
  • Pretend that you are a hacker and try breaking into the system (a persona-based charter)
  • Try out variations of the invoicing workflow focussing on rejected items (a scenario-based charter)
  • Try using the user interface using only the keyboard (a device-based or persona-based charter).
  • Try scenarios where several users try accessing the same account at the same time. (a scenario-based charter often called “tug of war”.)
The test charters are prioritized before being scheduled via assignment to a tester executing a specific test session. Upon completion of the test session, the tester may recommend additional charters be added to the backlog of charters based on what they had learned about the system-under-test, business domain, users’ needs etc. Exploratory testing is often done in an iterative style with many of the test charters for later iterations being discovered during execution of the test sessions in earlier iterations. This allows exploratory testing to focus on the areas of the software that have been found to be the most suspicious rather than providing the same amount of effort for all areas regardless of the quality level actually observed. This, combined with the low overhead nature of the simultaneous test design and execution, is what allows exploratory testing to be such an effective way of finding the bugs we know must lurk in the software.

A Spread Out Test Lifecycle – Scripted Testing

  1. In scripted testing, the average test lifecycle is much longer that in exploratory testing. The tests may be conceived as part of the test planning exercise or in more detailed test design activities. The actual tests are then documented or programmed, and potentially reviewed, often before the software is available for testing. The tests may require maintenance even before they are first executed against system-under-test if the design of the software has evolved since the tests were designed. Eventually, we determine the schedule for executing the tests and the tests are executed at the appropriate time (which may be weeks or even months later.) Any bugs we find are logged and the test results are reported to the stakeholders. The bugs are actioned, often weeks or even months after they were found. If the tests are to be repeated at a later time, usually against a subsequent version of the system-under-test, the tests may require maintenance to track changes in the system they test. Eventually, someone decides that this particular test is no longer adding any value and the test is abandoned.
  2. This cycle could take anywhere for several days or week to many years. The test team for Microsoft Office prepares extensive automated test scripts to verify the behavior of features in applications like MS Word. The tests for a specific generation of the product (e.g. Word 2003) have a lifetime of over 10 years because of Microsoft’s commitment of 5 years of mainstream support and a further 5 years of limited extended support for each business and development software product, followed by a minimum of 1 year of self-help online support via the knowledge base [Ref -]. New builds are typically created every week through this product lifetime and the tests are run against each new build. In this example the maintenance phase of the test dominates the individual test lifecycle. Microsoft patterns and practices team use continuous integration that potentially produces several builds a day with continuous automated test execution.

Intermediate Test Lifecycles – Hybrid Test Approaches

The two previous sections described the two extremes of test lifecycle duration. In practice, the test lifecycles can fit anywhere between these two extremes. There could also be a mix of test lifecycle durations even in the same testing session. For example, a manual tester could be following a detailed test script that was written months ago. They notice something odd that isn’t specifically related to the script and decide to go “off-script” to explore the oddity. In this off-script excursion they are doing simultaneous test design and execution (in other words, exploratory testing). At some point they may return to the original script after either confirming that the system is working properly or logging the bugs that they have found.

Practices for Assessing Software

  1. In Chapter 16 – Planning for Acceptance we introduced many practices in the context of planning the readiness assessment and acceptance testing activities. Now it is time to look at the practices that we use while designing, executing and actioning the individual tests. At this point we focus on the practices and not on who does them; it really doesn’t matter whether they are done as part of readiness assessment or acceptance testing as the practices themselves are not changed by when and who does them. Volume II contains a collection of thumbnails and job aides for each practice described here with Volume III presenting sample artefacts of applying those practices in a project.

Test Conception Practices

There are quite a few practices for conceiving tests or test conditions. Some are more structured or formal than others. They all share the goal of creating an extensive to-do list for our subsequent test design and execution efforts. Most start with either requirements, whether functional or non-functional, or risks (concerns about something that might go wrong.) Some of the more common techniques include the following:
  • Risk-based test identification
  • Threat modeling
  • Heuristics or checklists
  • Use case based testing – Define tests based on specific use cases of the system-under-test (see the Functional Testing thumbnail.)
  • Business rule testing –Define tests for various combinations of values used as inputs to business rules or business algorithms.
  • Interface-based testing – Define tests based on the characteristics of the user interface (human-computer interface) or computer-computer interface protocol.
  • Scenario-based testing – using real-world usage scenarios to inspire the design of test cases.
  • Soap-opera testing – Using exaggerated real-world usage scenarios to inspire the design of test cases.
  • Model-based test generation – Building one or more models of key characteristics of the system-under-test and generating tests from the model(s).
  • Group Brainstorming.
  • Paired/collaborative testing – Working together to design better test cases.

Risk-Based Test Identification

In risk-based test identification we do risk modeling to identify areas relating to functional or non-functional aspects that we are concerned might not be implemented correctly or might have been adversely affected by changes to the functionality. We use this information to identify test conditions we want to ensure are verified by tests. A good example of a kind of test that might be identified through risk analysis is the fault insertion test. For example, the risk we identified was “The network connection fails.” We ask ourselves “Why would the network connection fail?” and come up with 3 different possible causes: Unplugged cable, network card failure, network card out of service due to maintenance activity in progress. These are three test conditions we would want to exercise against our system-under-test. Other forms of risk might relate to potential mistakes during software development. E.g. “This shipping charge algorithm is very complex.” This might cause us to define a large number of test conditions to verify various aspects of the algorithm based on the kinds of mistakes a developer would be likely to make. For example applying the various surcharges in the wrong order.

Threat Modeling

Another form of risk modeling is threat modeling for security. The potential threats identified by the threat model can lead us to choose from a set of security assurance practices. We might define specific penetration test scenarios to ensure the software repels attempts at penetration. We might conduct security reviews of the code base to ensure safe coding practices have been followed. We can decide to do Fuzz Testing to verify that the software cannot be compromised by injecting specially chosen data values via user input fields.

Use Case Based Test Identification

When we have business requirements defined in the form of use cases we can identify test conditions by enumerating all the possible paths through the use case and determining for each path what input value(s) would cause that path to be executed. Each path constitutes at least one test condition depending on how many distinct combinations of inputs should cause that path to be executed.

Interface-Based Test Identification

Another source of test conditions is the design of the interface though which the use case is exercised whether it be a user interface used by a human or an application programming interface (API) or messaging protocol (such as a web service) used by a computer. The interface may have very detailed design intricacies in addition to the elements required to exercise the use case. These intricacies are a rich source of test conditions. For example, a user interface may have pulldown lists for some input fields. Test conditions for these pull-down lists would include cases where there are no valid entries (empty list), a single valid entry (list of 1 item) and many valid entries (long lists of items.) Each of these test conditions warrants verification.

Business Rules and Algorithms

Business rules and business algorithms are another rich source for test conditions. For a rule that validates a user’s inputs we should identify a test condition for each kind of input that should be rejected. Rules that describe how the system makes decisions about what to do should result in at least one test condition for each possible outcome. Rules that describe how calculations should be done should result in at least one test condition for each form of calculation. For example, when calculating a graduated shipping charge with three different results based on the value of the shipment, we would identify at least one test condition for each of the graduated values.

Scenario-Based Test Identification

Scenario-based testing is the use of real-life scenarios to derive tests. There are various kinds of scenarios. The scenario-based testing thumbnail describes a wide range of scenario types this introduction touches on only a few of them. A common type of scenario-based test is the business workflow test. These tests exercise the system-under-test by identifying common and not-so-common end-to-end sequences of actions by the various users of the system as a particular work item is passed from person to person as it progresses towards successful completion or rejection. We can examine the workflow definitions looking for points in the workflow where decisions are made, and define sufficient workflow scenarios to ensure that each path out of each decision is covered.
A particularly interesting form of scenario test is the soap opera test in which the tester dreams up a particularly torturous scenario that takes the system-under-test from extreme situation to extreme situation. The name comes from its similarity to a soap opera television program which condenses many days, months and potentially years of extraordinary events in peoples’ lives into short melodramatic episodes. This form of test identification is good for thinking outside the box.
Scenarios about how software is installed by a purchaser could lead to identification of potential compatibility issues and the need for compatibility testing.

Model-Based Test Generation

In model-based test generation we build a domain or environmental model of the system-under-test’s desired behaviour (expressed in mathematical terms or in some other abstract notation) and use it to generate all the relevant test cases. For example, when testing a function that takes 4 parameters each with 3 possible values, we could generate 81 test conditions (333*3) by iterating through each value for each parameter. For large and complex systems, the number of such cases will be huge. Various “guiding” or selection techniques are used to reduce the test case space. To fully define the expected result, we would need to have an independent way to determine the expected return value, perhaps a Comparable System Test Oracle or a Heuristic Test Oracle. Generation of the tests from the model may be fully automated or manual.

Identifying Non-functional Tests

The test cases used to assess compliance to the non-functional requirements can be identified in much the same ways as those for functionality requirements with the main difference being how the requirements are discovered and enumerated.

Other Test Identification Practices

All of these practices can be applied by a single person working alone at their desk. But a single person can be biased or blind to certain kinds of test conditions; therefore, it is beneficial to involve several people in any test identification activities. One way to do this is to review the test conditions with at least one other person, a form of test review. Another practice is paired testing in which two people work together to identify the test conditions (or to write or execute tests.) We can also use group techniques like listing, brainstorming or card storming to involve larger groups of people [TabakaOnCollaboration].
All these practices can benefit from the judicious use of checklists and heuristics. These can be used to trigger thought processes that can identify additional tests or entire categories of tests. They are also useful when doing exploratory testing.

Test Authoring / Test Design Practices

Now that we have of list of things we want to test, our to-do list, we can get down to designing the test cases. A key test design decision is whether we will prepare a separate test case for each test condition or address many test conditions in a single test case. There is no single best way as it depends very much on how the tests will be executed. When human testers will be executing the test cases it makes a lot of sense to avoid excessive test environment setup overhead by testing many test conditions in a single test case. The human tester has the intelligence to analyse the impact of a failed test step and decide whether to continue executing the test script, abandon test execution or to work around the failed step. Automated tests are rarely this intelligent therefore it is advisable to test fewer test conditions per test case. In the most extreme case, typical when automating unit tests, each test case includes a single test condition.

Tests as Assets

Tests are assets (not liabilities) that need to be protected from loss or corruption. They should be managed with the same level of care and discipline that is used for managing the product code base. This means they should be stored in a version-controlled repository such as a source code management (SCM) system. The line up of tests that correspond to a particular line up of product code needs to be labelled or tagged in the same way the product code is labelled so that the tests the correspond to a specific product code build can be retrieved easily.

Tests as Documentation

Regardless of whether a test case will be executed by a person or a computer, the test case should be written in such a way that a person can understand it easily. This becomes critical for automated tests when the tests need to be maintained either because they have failed or because we are changing the expected behavior of the system-under-test and we need to modify all the tests for the changed functionality.
Many of the practices used for identifying test conditions carry through to test case design but the emphasis changes to enumerating the steps of the test case and determining what the expected outcome should be for each test. For use case tests we must enumerate the user actions that cause the particular path to be exercised. We also need to include steps to verify any observable outcomes as we execute the steps and a way of assessing whether the outcome matches our expectations. (See the section 17.3.5 on Result Assessment Practices1

Picking an Appropriate Level of Detail for Test Scripts

As we define the steps of our tests it is very important to ensure that the level of detail is appropriate for the kind of test we are writing. For example, in a business workflow test, each step of the test case should correspond to an entire use case. If we were to use the same test vocabulary and level of detail as in a use case test, our workflow tests would become exceedingly long and readers of the test would have a hard time understanding the test. This is a classic example of “not being able to see the forest for the trees.” Soap opera tests are written much like a workflow test except that the steps and circumstance are more extreme. Again, each step in the test should correspond to an entire use case.

Business Rule Testing

Business rules tests can be designed much like use case tests by interacting with the system under test via the user interface. If there are a lot of test conditions, it may make testing proceed much more quickly if we use a data-driven test approach wherein we enumerate each test condition as a row in a table where each column represents one of the inputs or outputs. Then we can simply write a parameterized test case that reads the rows from the table one at a time and exercises the system-under-test with those values. An even more effective approach requires more co-operation with the product development team because it involves interfacing the test case that reads the rows of the table directly to the internal component that implements the business rule (we call them “subcutaneous tests”). This approach results in automated tests that execute much faster than tests that exercise the software through the user interface. The tests can usually be run much earlier in the design phase of the project because they don’t even require the user interface to exist. These tests are much more robust because changes to the user interface don’t affect them. These tests are particularly well suited to test-driven development.

Model-Based Testing

A more sophisticated way of using models is to generate executable test cases that include input values, sequencing of calls and oracle information to check the results. In order to do that, the model must describe the expected behaviour of the system-under-test. Model building is complex but is the key. Once the model is built, a tool (based on some method or notation) is typically used to generate abstract test cases, which later get transformed into executable test scripts. Many such tools allow the tester to guide the test generation process to control the number of test cases produced or to focus on specific areas of the model.

Usability Testing

The design of usability testing requires an understanding of the goals of users who will be using the system-under- test as well as the goals of the usability testing itself and the practices to be used. The goals of testing will change from test session to test session and the practices will evolve as the project progresses. Early rounds of usability testing may be focused on getting the overall design right and will involve paper prototypes, storyboards, and “Wizard of Oz” testing. Later rounds of testing are more likely to involve testing real software with the purpose being to fine tune the details of the user interface and user interaction. All of these tests, however, should be based on the usage models defined as part of User Modeling and Product Design practices.

Operational Testing

Functional requirements tend to focus on the needs of the end users but there are other stakeholders who have requirements for the system. The needs of the operations department, the people who support the software as it is being used, need to be verified as part of the acceptance process. The specific needs may vary from case to case but common forms of operational acceptance testing include:
  • Testing of installers, uninstallers and software updates.
  • Testing of batch jobs for initial data loading
  • Testing of data reformatting for updated software.
  • Testing of data repair functionality.
  • Testing of start up scripts and shutdown scripts.
  • Testing of integration with system monitoring frameworks.
  • Testing of administrative functionality such as user management.
  • Testing documentation content.

Reducing the Number of Tests Needed

If we have too many test conditions to be able test each one, we can use various reduction techniques to reduce the number of test conditions we must verify. When we have a large set of possible inputs to verify, we can reduce the number of test cases we need to execute by grouping the input values into equivalence classes based on the expected behavior of the system-under-test. That is, an equivalence class includes all the inputs for which the system-under-test should exhibit the same or equivalent behavior (including both end state and outputs.) We then select a few representative input values from each equivalence class for use in our tests. When the values are numeric and ordered (e.g., integers or reals) we pick the values right at the boundaries between the different behaviors, a technique known as boundary values analysis (BVA). When they are nominal , i.e., they represent classification of behaviours with no natural ordering (e.g., a finite set of artbitrary strings or enumeration types with no meaningful ordering in the context), we can design a single data-driven test for each equivalence class and run the test for each input value in the class.
If we have several inputs that can each vary and we suspect that the behavior of the system based on the individual inputs is not independent, we avoid testing all combinations of input values by using combinatorial test optimization to reduce the number of distinct combinations we test. Examples include:
  • Many independent variations or exception paths in a use case.
  • Many different paths through a state model.
  • Algorithms that take many independent input values that each affect the expected outcome.
  • Many system configurations that should all behave the same way.
The most common variation of combinatorial test optimization is known as Pairwise or All-Pairs testing ; it involves picking the smallest set of values that ensure that every input value is paired with every other value at least once. This technique is used so frequently that there is a website[PairWiseOnAllPairsTesting] dedicated to listing the many open source and commercial programs that exist to help us pick the values to use. It has been shown that pairwise testing provides better coverage and results in fewer tests than random testing.

Test Execution Practices

The details of how the tests are executed vary greatly across the different kinds of tests but several things are common. A key outcome of test execution is the data that will be used as input to the subsequent readiness and acceptance decisions. Though various project contexts will require much more extensive record-keeping than others, it is reasonable to expect a minimum level of record keeping. This record-keeping consists of the following key pieces of information:
  • What tests have been run, by whom, when and where?
  • What results were observed?
  • How did they compare to what was expected?
  • What bugs have been found and logged?
The comparison with expectations is described in the section Result Assessment Practices1

Functional Test Execution Practices

The nature of a test determines how we execute it. Dynamic tests involve running the system-under-test while static tests involve inspecting various artefacts that describe the system. Dynamic tests typically fall into one of the following categories:
  • Automated Functional Test Execution – This involves using computer programs to run the tests without any human intervention. The test automation tool sets up the test environment, runs the tests, assesses the results and reports them. It may even include logging of any bugs detected. The tests may be started by a human or by an automated test scheduler or started automatically when certain conditions are satisfied such as changes to the code base.
  • Manual test execution – This involves a person executing the test. The person may do all steps manually or may use some automation as power tools to make the testing go faster. The human may adjust the nature of the test as they execute it based on observed behavior or they may execute the steps of a test case exactly as described.
  • Exploratory test execution – This is a form of manual test execution that gives the tester much more discretion regarding exactly what steps to carry out while testing. Each test session is usually scoped using a test charter.

Non-functional Test Execution Practices

Non-functional tests are somewhat different from functional tests in several ways. As intimated by their name, para-funcational tests span specific functions of the system. Static non-functional tests may involve running tools that analyse the software in question or they many involve reviewers who look at the code or other artefacts to find potential design or coding defects. Dynamic non-functional tests may involve running tools that interact with the system under test to determine its behavior in various circumstances.

Static Non-functional Test Practices

Static analysis is done by examining the code or higher level models of the system to understand certain characteristics. Specific forms of static analysis include the following:
A Design/Architecture Review is used to examine higher-level models of the system to understand certain characteristics. The most common characteristics of interest include capacity/scalability, response time and compliance with standards (internal and external.)
A Security Review is a specialized form of Design or Archiecture review that involves examining the architecture and the code looking for ways a malicious user or program might be able to break into the system.
Static Code Analysis involves examining the source code either manually or using tools to ascertain certain characteristics including:
  • Reachability of code segments
  • Correct usage of key language constructs such as type safety

Dynamic Non-functional Test Practices

Performance and stress tests are good examples of non-functional tests that require specialized tools. Sometimes we have complex or long-running test procedures that exercise the system-under-test just to see what will happen; there need not be enumerated expectations per se.

Result Assessment Practices

The value in executing a test case is to determine whether or not some requirement has been satisfied. While a single test cannot prove the requirement has been met, a single failing test can certainly prove that it has not been met completely. Therefore, most test cases include some form of assessment or checking of actual results against what we expect. This assessment can happen in real time as the test is executed or it may be done after the fact. This choice is a purely pragmatic decision based on the relative ease of one approach versus the other. There are the following three basic approaches to verifying whether the actual result is acceptable.
  • Compare the actual result with a predetermined expected result using a comparator which may either be a person or a computerized algorithm.
  • Examine the actual result for certain properties that a valid result must exhibit. This is done using a verifier.
  • Just run the tests and not check the results at all. This may be appropriate when the testing is being conducted expressly to gather data. For example, the purpose of usability testing is to find out what kinds of issues potential users have using the product. We wouldn’t report an individual usability test session as having failed or succeeded. Rather, we aggregate the findings of all the usability test sessions for a specific piece of functionality to determine whether the design of the system-under-test needs to be changed to improve usability.

Using Comparators to Determine Test Results

The most convenient form of assessment is when we can predict what the actual results should be in a highly deterministic fashion. The mechanism that generates the expected result is sometimes called a “true oracle” and there are several ways the test results can be generated. Tests that have a true oracle are very useful when doing Acceptance Test Driven Development because there is a clear definition of “what done looks like.”
When there is an existing system with similar functionality and we expect the new system to produce exactly the same results, it may be convenient to use the existing system as a Comparable System Test Oracle.
When we have a new system for which no comparable system exists, we often have to define the expected results manually. This is known as a Hand-crafted Test Oracle. Once the system is up and running we may also have the option of comparing subsequent releases of the software with previous versions, an approach we call Previous Result Test Oracle. This is the approach that most “record and playback” or “capture / replay” test tools use. In some cases it may be appropriate to forgo the Hand-Crafted Test Oracle and wait for the system to generate results which are then inspected by a person, a Human Test Oracle, before being used as a Previous Result Test Oracle. This approach forgoes the benefits of Acceptance Test Driven Development.

Using Verifiers to Determine Test Results

When we cannot predetermine exactly what the expected result should be, we can instead examine the actual result for certain characteristics. Rather than have an oracle describe what the result should be, we ask the oracle to make a judgement as to whether the result is reasonable given the input(s). This approach has the advantage of not requiring that we predetermine the expected result for each potential input. The main disadvantage is that we may accept some results that satisfy the invariant but which are not actually correct.
For example, a Human Test Oracle could examine a generated graphic to determine whether they can recognize it as an acceptable rendering of an underlying model. Or a computer algorithm could verify that when the actual result is fed into another algorithm the original input value is recovered. The program that implements this algorithm is sometimes called a Heuristic Test Oracle. Heuristic Test Oracles may be able to verify some results are exactly correct while for other results it may only be able to verify they are approximately correct.
For example, we could write a test script that steps through all integers to verify that the square root function returns the right value. Rather than inspect the actual values returned by each function call and compare them to a hand-crafted test oracle or a comparable system oracle, we could instead multiply the result by itself and verify that we get back the original number within a specified tolerance, say, +/- .001. In this example, we should also test against another invariant to ensure that the actual result is not negative which would clearly be a test failure. A computerized heuristic oracle is sometimes called a verifier.

Logging Bugs

One of the key reasons for testing and reviews is to find differences between expectations and reality. When we do find such a difference it is important that it be logged so that it can be further investigated, prioritized and the appropriate action determined. The investigation could reveal it to be a bug, a misunderstanding by testers about how the system was intended to be used, missing documentation, or any of a myriad of other reasons. To avoid presuming the outcome, we prefer to call these concerns. To allow the investigation to be conducted efficiently, it is important to log the key characteristics of each concern. At a minimum, we need to log the following:
  1. The exact steps required to reproduce the problem. This may require rerunning the test a number of times until we can confidently describe exactly what it takes to cause the problem to occur.
  2. What actually occurred.
  3. What we expected to happen. We should provide as much detail as would be necessary for the reader to understand. We should not just refer to a requirement but rather describe exactly what we expected, what happened, and how what actually occurred was different from what we expected.Every potential bug report should be given a clear title that describes specifically what was tested; this avoids confusion between bugs with very similar titles yet completely different expected and actual behaviours.
Refer to [Kaner et al, Chapter. Reporting and Analyzing Bugs.] on bug reporting guidance and for an additional example.

Test Maintenance Practices

Some tests are only intended to be run once while some are intended to be run many times over a long period of time. Some kinds of tests hold their value longer than others; some kinds of tests deteriorate very quickly because they are so tightly coupled to the SUT that even small changes to the SUT make them obsolete. Tests that are expected to be used more than once may warrant an upfront investment to ensure that they are repeatable and robust.
Useful techniques for making tests more robust include the following:
  • Build maintainability into the tests. Write the tests at appropriate level of abstraction. Don’t couple the test to any part of the system it isn’t testing. Don’t provide any unnecessary or irrelevant detail in any of the steps of the test. Strive to describe the test steps in business rather than technical terms.
  • Design the system-under-test for testability. Designing for testability is common practice in computer hardware but is too often neglected in software design. Design the system to make it easy to put it into a specific state. Make it easy for test programs to interface with the system through application programming interfaces (API) rather than forcing programs to use interfaces intended for humans. Writing the tests before the system is designed is a good way to influence the design to support testability.
  • Refactor the tests to improve maintainability. Tests should be assets, not liabilities. Tests that are hard to understand are hard to maintain when the system-under-test is modified to meet changing requirements. Automated tests in particular should be refactored to avoid unnecessary duplication and irrelevant information. See the Test Evolution, Refactoring and Maintenance thumbnail.


This chapter introduced the practices we use while defining, executing and maintaining individual tests. While some of these practices are specific to functional testing and others are specific to non-functional testing, the overall lifecycle of a test is consistent for both categories of tests. The practices used for readiness assessment are more or less the same as those used for acceptance testing although the specific tests produced for each will depend on the overall test plan as described in Chapter 16 – Planning for Acceptance.

What’s Next?

The next chapter describes practices related to managing the acceptance process, especially how it relates to monitoring and reporting on test progress and the results and managing the process of deciding which bugs to fix.


[PairWiseOnAllPairsTesting] Pairwise.Org - A website dedicated to cataloging all the tools available for allpairs testing.
[TabakaOnCollaboration] Tabaka, Jean “Collaboration Explained” Addison Wesley. NJ Addison Wesley. NJ [Kaner et al] Kaner, C. et al. Testing Computer Software, 2/e. Wiley, 1999.

Last edited Nov 7, 2009 at 1:33 AM by rburte, version 3


No comments yet.