Unit tests are truly amazing! Unit tests are usually written by the developer themselves in parallel with development. They don't necessarily require additional testing personnel (although it's hard to recommend bringing in additional people for software testing). Furthermore, they uncover errors very early in development, and in some cases, these errors can even be narrowed down to the lines of code that caused them. Errors found in this way therefore don't have to be searched for later in time-consuming debugging sessions. This also applies, of course, to static code analysis and code reviews. At least compared to code reviews, however, unit tests have the advantage of being automated, and thus finding errors automatically based on code changes. So, if unit tests are really that great, you should think:
“The more lines of code I cover as a developer, the more errors I logically find.”
One would think…
What is actually coverage?
First, some theory: Coverage refers to the degree to which a software is covered by tests. This is usually expressed as a percentage. There are several ways to calculate coverage. Two of the most common are:
- Statement coverage: The percentage of covered statements in the code.
- Branch coverage: The percentage of branches traversed during the test.
For example: If a switch case has a default case in which absolutely nothing happens (of course, there should be at least one comment explaining why nothing needs to happen here), then this default case does not need to be executed during the test in order to increase the statement coverage to 100. After all, it does not contain a statement. However, for branch coverage, every possible outcome must be tested for every decision. This means that in our switch case example, every case must be executed to achieve 100 statement coverage. The figure below shows another example. The image shows a coverage calculation for the function "testFunction". Lines highlighted in green were executed in at least one test. Lines highlighted in red, on the other hand, were not executed in any test. In relation to the first if-else statement (lines 14-19), the statement coverage is 100, since all statements were executed. The branch coverage for this area, on the other hand, is only 50. The second if-else statement (lines 21-26) results in a statement and branch coverage of 50% each. There are also other ways—such as "function coverage" or "modified condition/decision coverage"—to calculate the test coverage of unit tests. For the sake of simplicity, however, I won't go into these here.

The problem with coverage
I recently worked on a project where the client's processes required 100 % statement coverage. Most of the other developers I told about it were shocked: "What, there's such a thing?" "...but does it make sense?"
Yes, of course there is such a thing! But it doesn't necessarily make sense for the following reasons:
Anyone involved in project planning seems to generally allocate too little time for software testing. Why this is the case would certainly fill another blog post. But the consequences are obvious. If the processes write 100% coverage (regardless of which one is used for the calculation), this means a gigantic mountain of work that the developer can hardly manage. At least not if they want to do their work conscientiously (let's assume at this point that the developer writes the unit tests for their code themselves). Fortunately, the processes also provide a way out! All that is required is x 1% statement coverage and possibly y 1% branch coverage. How this should be achieved is generally not prescribed. In an ideal world, a developer would, of course, still devote themselves conscientiously to each unit test. Even if they are only written in a boring, completely linear way. getter functions testing, where every error is visible from afar. In practice, however, there is either no time for this, or the writing of a series of completely useless unit tests dulls the developer to the point where they perceive writing such tests as an extremely annoying chore. High coverage may therefore actually lead to a decline in the quality of the individual test cases. After all, in the end, no one asks whether the test is more than
„3 = 3 ?“
At least not when the automatically generated protocol spits out a nicely high coverage and then even highlights this number in green.
|
|
| M.Sc. Björn Schmitz, Software Developer E-mail: schmitz@medtech-ingenieur.de Phone: +49 9131 691 240 |
|
|
Do you need support with the development of your medical device? We're happy to help! MEDtech Ingenieur GmbH offers hardware development, software development, systems engineering, mechanical development, and consulting services from a single source. Contact us. |
|
Test coverage is therefore not necessarily a measure of how few errors a software contains. A similar result was reached in a study conducted by the University of Gothenburg together with Ericsson. This study was published under the title "Mythical Unit Test Coverage and describes a study that attempted to find a correlation between unit test coverage and error-free code. The authors did not find a clear correlation between test coverage and the number of errors found. However, they did find a correlation between complexity, the size of the software files (line of code), the number of code changes, and the number of errors found. Therefore, it could be much more effective to invest in adhering to coding guidelines, well-encapsulated code, and a clear software architecture (which, of course, should also be adhered to) than to engage in excessive unit testing.
Why do coverage software metrics persist?
This question is, in principle, easy to answer. The usefulness of tests can only be measured by delving deeply into the source code and understanding it. Comparing a number in the log against the target value is, of course, much easier. If an error occurs later in the field, you can refer to this log and say, "We tested our software to the best of our knowledge and belief. Therefore, we couldn't possibly have anticipated errors." This, of course, provides a safeguard, and in the best case scenario, you achieve a certain "minimal standard" with the enforced test coverage. However, you certainly won't achieve error-free code.
What can be done better?
This is where things get tricky. Important building blocks for (as far as possible) error-free code are the measures already mentioned:
- Low code complexity
- Good encapsulation of software and no oversized software files
- A clear and well-thought-out software architecture that is also adhered to by the developer (and whose compliance is, of course, verified via code reviews)
Furthermore, unit testing – for the benefits described above and, of course, completely undisputed – should not be neglected. However, my suggestion would be to develop a concept for a project (or better, for all software projects within a company) that defines what should be tested. This could be done, for example, using a checklist:
Does the function implement risk measures? –> test
Could the function cause harm to the user if it malfunctions? -> test
During a code review (which usually has to be done anyway), a developer—who did NOT write the code—checks whether every function that fits the criteria also has a corresponding test. This allows developers to focus on the essentials. If you want to know more, you can conduct a random review of the unit tests. If many problems are found, you can expand the search criteria. If you conduct this process continuously as you develop, the developer will automatically start writing better tests.
Of course, one could imagine that the reviewers are cautious about uncovering bugs so as not to upset the developer. Or the developers might even coordinate what is reviewed in the sample tests. I personally consider that highly unlikely. After all, very few developers want to produce faulty code. But of course, it can't be completely ruled out. To allay such concerns, I recommend simply using external resources for review or test writing, such as a developer from another department. If no suitable developer is available in the company, it may also be sensible to hire an external service provider. Such an external tester offers the following advantages, among others:
- The tester does not know the developer personally and can therefore approach code and unit tests with a certain neutrality.
- The tester sees software architecture and coding guidelines for the first time and is not yet accustomed to certain deviations that have become commonplace in the development group.
- An internal developer reviewing another's code might be afraid of uncovering too many bugs, which would make them appear "pedantic." This could, in turn, lead to the author taking revenge the next time and dissecting the tester's code with the same level of scrutiny. An external tester, of course, doesn't have this problem.
- An external developer acting as a tester, who doesn't normally work with the developers, automatically compares development processes and the developed software with the processes and code from other projects. This may identify vulnerabilities that the project's internal developers overlook.
Conclusion
Unit tests are a key pillar of software testing, allowing for early and targeted detection of errors. However, test coverage alone says nothing about the quality of the code. A test concept that relies exclusively on high test coverage can, under certain circumstances, even degrade the quality of individual tests. It is more important to develop a good test concept and define who should test what and who should verify it. An external (project) software tester is always helpful in this regard. However, the following remains true: The best recipe for error-free code is a good software architecture translated into easily readable and not overly complex software modules.
