When is software correct

Developing software should always go hand in hand with testing it. With tests, we can prove that our software works correctly. There are many ways to prove this, but all of them require us to define what “works correctly” is. In this article, I’ll explore how we can define correct in the context of software testing.

The need for correct

When testing software, we have a bunch of tests that must pass. A single failing test means that we have to change our application. A test fails when one of its assertions is rejected. A test that can’t fail is useless. These are all things we take for granted. It’s the way we work with the accepted terminology. However, I noticed that we don’t stop to think about what defines our assertions even though they are the most important part of a test.

Reading this, you might think: “Correct is defined by requirements, obviously. What are you getting at, you poser.” And you’d be right, but only in a perfect world. In the real world, requirements are never complete enough to define every assertion in your tests. We have to fill in the gaps to get a complete definition of correct.

To create a complete definition of correct, we start by using the requirements. But we also utilize our experience, knowledge, and skills.

We use our experience to prevent common issues and to use our knowledge effectively.

We use our knowledge to find out where the biggest risks are. Bigger risks should have more thorough definitions of correct. For this, we use our knowledge of basic human psychology (e.g. I might overlook this button), software development processes (e.g. A rushed feature), the business domain (e.g. Banking), the organization (e.g. This breaks someone’s workflow), and other topics.

We use our skills to gather data or knowledge when needed. We also use it to tie everything together into a cohesive, usable whole.

Once we have defined correct, we use the definition to make assertions for our tests.

diagram: requirements, experience, knowledge, and skills make correct. Correct is the input for the assert step in a test.

Implicit definitions of correct

Automated tests force us to write down correct in the form of assertions in code. Every automated test requires one or more assertions. These written assertions are an explicit definition of correct.

It’s less common for testers to write down the definition of correct when testing manually. They often stop at requirements, not quite at the level of correct. However, this doesn’t mean that there is no such definition. Instead, the tester uses an implicit definition of correct. Implicit definitions should be written down, making them explicit and — more importantly — reusable. Though, I recognize that this is not always practical.

Every test that can fail has a definition of correct, either implicit or explicit. A test that can’t fail is useless.

How to define correct

There are two fundamental ways to define correct: Examples and rules. They are fundamental because you can’t describe them as something other than themselves. They are the most basic form of defining correct that I know.

The descriptions below are not very detailed but should be enough to give you a rough idea.

Example

An example requires a specific situation (= state and input) and correct result (= output or side effect) for a specific Subject Under Test (SUT). The correct result applies only to this situation and this SUT. The test fails if the correct results do not equal the actual results. You can prove that the SUT works correctly by doing this multiple times for the same SUT in different situations.

Examples are by far the most used approach to defining correct. To write a test with this type of correct, we have to do the following:

  1. Set up an example state
  2. Execute the Subject Under Test with example input
  3. Assert if the actual results are the same as the correct results

You might recognize these steps as the AAA pattern (Arrange, Act, Assert).

Correct is defined as one or more assertions in the third step and only applies to this example. Another example might have a different definition of correct and thus different assertions.

diagram: how examples work. Example state and example input are given to the subject under test which produces the actual results. Actual results and predefined correct results are asserted to be equal. If not equal, the test fails.

Rule

A rule is a reusable bundle of logic, often with a human-readable name. A rule is considered broken if the rule input is not correct according to the rules logic. A rule must never be broken regardless of state, input, or Subject Under Test (SUT). A broken rule fails the test. You can prove that the SUT works correctly by applying multiple rules.

The fact that a rule must never be broken, means that a rule will sometimes disable itself. For example, when the rule is not relevant with the given input. Disabling itself would be unacceptable for examples but is fine for rules.

To write a test with this type of correct, we have to do the following:

  1. Write a rule
  2. Execute the rule with an example situation (= state and input) and the Subject Under Test
  3. Assert if the rule is broken

An advantage of rules is their reusability. Because of this, we can run step two many times with generated input (e.g. Property-based testing). It also allows us to create rulesets that we can share between applications and teams (e.g. Linters).

diagram: how rules work. Any state and any input are given to the subject under test. The output is send to a rule. We assert if the rule is broken. If the rule is broken the test fails.

Using correct in different ways

Using the fundamental approaches to correct, we can better understand other approaches to testing. I have yet to come across testing approaches that I can’t define as examples and/or rules.

Most non-fundamental approaches combine correct with other aspects of software development. For example, Contract-based testing defines correct with rules derived from contracts. On top of that, it adds a structured way for teams to work together and stay in sync.

Other approaches combine both fundamental ways to create a better definition of correct. For example, Property-based testing defines correct by wrapping an example in a rule. This results in many rule-generated examples that together create a solid definition.

There are many more approaches to be discovered here. I’m working on an overview of approaches to testing with rules and examples at its center. However, that’ll have to wait until another day and another article.

Conclusion

To prove that our software works correctly, we first define what correct is. Correct is defined based on the requirements, experience, knowledge, and skills. Sometimes we define correct implicitly, but we should strive to make it explicit instead.

We can define correct in two fundamental ways: Examples and rules. Examples require a specific situation and correct result for a Subject Under Test. Rules are reusable and must never be broken, regardless of the situation or Subject Under Test.

We can define many interesting test approaches in terms of examples and rules. In a future article, I will explore this further.