GnuTLS vulnerability: is unit testing a matter of language culture?

michaelfeathers · on March 21, 2014

Unit testing is definitely influenced by language culture but there's also the issue of how easy it is to do.

JS doesn't have the unit testing uptake that Java/C#/Ruby/Python have, and I think that largely it is because JS is often used in the front-end and people aren't used to having any sort of an abstraction layer.

The issue with C is that it is hard to mock. You can use the preprocessor, the linker, or function pointers. The latter is the best option but your code ends up looking quasi-OO with structs of function pointers. It's a style people are used to in some OS programming but it does freak some people out.

dalke · on March 21, 2014

I once did a security audit of Fitnesse, perhaps the best known software developed under the red-green-refactor cycle of TDD. I found a number of security vulnerabilities. A path traversal vulnerability meant I could read an arbitrary file on the server, including the password file. The password hash encoding is weak, and given a hash I was able to find a password which would match the hash. Once I had upload access to the server, another path traversal vulnerability meant I could upload to an arbitrary spot on the server, so long as the process had permissions to it. By design, the server can execute Java code. Combined, this means an attacker could run arbitrary code on the server.

While the worst vulnerabilities have since been fixed, the server still contains a CSRF vulnerability.

These all suggest that unit testing/TDD doesn't really add much to security protection.

michaelfeathers · on March 21, 2014

> These all suggest that unit testing/TDD doesn't really add much to security protection.

I find it hard to believe that anyone would find that surprising. Security is a thing you either have in mind when you are working or not.

dalke · on March 21, 2014

The article is titled "GnuTLS vulnerability: is unit testing a matter of language culture".

The coupling of vulnerability to unit testing is, IMO, a red herring. I've never seen any evidence that unit testing leads to more secure systems than any other commonly used method, including code review and coverage based functional tests. My observation is that unit testing/TDD doesn't significantly reduce security failures (ie, it may be the same, or it may be worse), and I use my Fitnesse pen test as a real-world example. You seem to agree.

The article goes on to say "An automated test suite should have immediately spotted that invalid commit, right." I just downloaded the package. GnuTLS has an automated test suite. There's nearly 22KLOC in the tests/ subdirectory, with 75KLOC in gl/ lib/ and src/ combined.

The rest of the article slags on some perceived lack of testing based on C culture ("did you really, honestly, expect a C code base that reaches back more than a decade to be under surveillance of ideal unit-tests, by modern standards"), and implies that "ideal unit-tests, by modern standards" would have caught error. This is unjustified.

Moreover, we know what ideal non-unit tests ca. 10+ years ago looks like. For example, SQLite is the exemplar of how a C library, without low-level unit tests, and coupled with strong code coverage, can help produce high quality code.

(To tie in with your other post, in part because SQLite uses "structs of function pointers" to make a plug-in filesystem architecture for users, which also allows emulation of unusual file system failures.)

If GnuTLS had the ideal tests of the standard of 10+ years ago, then these bugs would still have been found, unit tests or not. As its authors point out in the README, "Thorough testing is very important and expensive."

So, "unit tests" isn't really is issue, is it?

I would have been happier with an article titled "GnuTLS vulnerability: we are all cheapskates".

michaelfeathers · on March 22, 2014

> The article goes on to say "An automated test suite should have immediately spotted that invalid commit, right." I just downloaded the package. GnuTLS has an automated test suite. There's nearly 22KLOC in the tests/ subdirectory, with 75KLOC in gl/ lib/ and src/ combined.

Did you see a test for the error? Maybe check to see if there were tests for other parts of code that prevented vulnerabilities? I'm sure there are.

First and foremost, people need to be conscious of security to do well. No amount of testing will overcome that. But, if you have security consciousness, I'm sure that TDD can augment it. It's a practice that encourages reflection.

Re the article itself, fair enough, it uses a flawed example but I can back up the assertion. I see less unit testing in C codebases.

dalke · on March 23, 2014

I don't understand your question. I think the answer is that I don't have the domain knowledge to evaluate the GnuTLS tests, so I don't know what I'm looking for. I tried to compile the package but I'm missing at least one third-party package, and then decided it wasn't worth my time to dig more into the code.

My point is that the article makes a false claim, and the veracity of that claim is easy to determine. This makes it hard for me to believe what the author is proposing is actually true or meaningful.

You write that you have observed less unit testing in C code bases. It's hard to know what to make of that observation. Is it a function strictly of language culture, as the author suggests? Or is it more a function of age? Perhaps C packages developed in the last 5 years have similar test effectiveness as Ruby packages developed in the last 5 years. Or perhaps something else is the key discriminator? While such an analysis is possible in theory, it will be hard to standardize effectiveness across a large number of packages.

I used the clumsy term "test effectiveness" there instead of "unit tests" for a reason. As you write, unit testing is harder in C because of the difficulty of making mocks. Personally, my views are much more aligned with that of James Coplien. I believe, quoting his recent essay, "most unit testing is waste", and "[y]ou'll probably get better return on your investment by automating integration tests, bug regression tests, and system tests than by automating unit tests."

Which means that I'm not really convinced that a metric based on the number of unit tests is that persuasive an indicator of security, software quality, or other more operationally based goals, when it should really be all tests, of which unit tests are only one part.

As you say, "if you have security consciousness, I'm sure that TDD can augment it". My observation though is that any sort of testing can augment security consciousness, so why specifically promote TDD, when TDD alone is often insufficient and other tests (including some non-TDD unit tests) are needed? (Fuzz tests are an example of a potentially useful non-unit test which can specifically help some aspects of security.)

That is, TDD is a design method, not a testing method. It's a strict subset of unit testing as a whole. TDD, at least in its red-green-refactor formalism, doesn't have a spot for adding tests which are expected to pass. These tests might come from, say, formal boundary-condition checking, algorithm complexity analysis, or security tests, and serve as a validation that the algorithm as implemented can handle the full range of inputs.

Of course, I picked the RGR formalism because it's obviously incomplete in the first place. The list of refactorings includes "Substitute Algorithm". Even if the new algorithm is cleaner, there can be different boundary cases than the previous algorithm, so new tests may be needed in order to fully test the new algorithm. But most, if not all, RGR descriptions say about the refactoring step something like "Now that your tests are passing, you can make changes without worrying about breaking anything." (Emphasis mine.) That's clearly not universally true, which tells me that the RGR formalism can't be generally correct.

Going back to Kent Beck's original Fibonacci walk-through, it's clear that the final stack-based Fibonacci algorithm will take exponential time, and likely never compute Fib(30). Even if memoization is added, to give the linear time performance and avoid a stack overflow, it will silently overflow on output values large than max int. This example tells me that formal boundary-condition checking is not at the heart of TDD.

It may be an add-on, but later examples, like Robert Martin's prime factorization Kata, also fails to establish that the final algorithm works across the range of acceptable inputs.

As you might infer, I have looked hard for a description of how to incorporate security as part of the TDD process, and failed. There are people who suggest using the unit test framework (post-development) for security tests, but test-after isn't part of TDD.

Do you have any pointers to how to combine security and TDD?

Referring specifically to Fitnesse, I think it shows that "conscious of security" is about as useful as saying "conscious of formal boundary-condition checking." That is, the Fitnesse authors have some security consciousness. They chose a password hashing function specifically for better security. It's just that they didn't know enough to choose a good one, and instead made one which is easily broken.

This is really skill acquisition, which is more difficult than simple consciousness. Security skills aren't important for the 99+% of programming. It just that <1% of a 100KLOC program still leaves a pretty vulnerable attack surface. (The math doesn't actually work that way, but the idea holds.)