Test Driven Development without Tears
Every company that I worked for has its own method of testing, and I've gained a lot of experience in what works and what doesn't. At last, that stack of conflicting confidentiality agreements that I got as a coop student have now all expired, so I can talk about it. (I never signed them anyway.)
Warning: My recollection of events may be different from what actually occurred. Do you remember what you were doing 10 years ago?
Microsoft's is the worse of the two. The system of my particular team in 2000 was so bad that developers couldn't even use it. Only dedicated Software Test Engineers had access to it and could run the full suite. Setting it up on a new PC could not be done in a day. Eventually at the end of the summer I managed to get a copy running, and it found lots of bugs in my module, but by then I ran out of time to fix them. Yes, this was all my fault because I should have been running the tests earlier. I didn't run them because they were so darn hard to get a hold of and set up.
Testing must be done continuously and during development, by developers.
Soma had the right idea. They were working on a software phone that ran on Linux. Before checking in a change, we were required to write a test for it and run the full regression suite without any failures.
The software and tests were all written in Java, so to create a test, you'd create a Java object with a procedure that called the high level APIs to start a phone call, and figure out a way to use the feature you were working on.
Some of the features needed a complex set of steps to invoke. If you wanted to test the user hitting a #-code to add a participant during a five-way conference call, you had to first set up five fake calls using the Java API. There were functions for dialing a number and such. Whenever an API changed, all the test objects would have to be updated.
The test system was downright painful to use. Early versions ran in real time, with real timeouts. To test that the dial tone only played for 30 seconds before the off-hook beeps started, you had to sit there and wait while the system did nothing until the timer expired. There was a speeded-up mode where timers would expire immediately. But even in speed mode, the Java system was so bogged down that it would take several hours to run everything. While I was there, they were in the process of building a cluster of 100 PCs just to run all the tests.
The problem was that all of the tests were system level. Programming in Java eventually leads to a system that I call object-soup: More abstraction can seem to be better, but it leads to more classes, and more classes have more references to each other. The system grew so complex that there was no way to reset the state to the middle of a five-way call, without actually performing all of the call setup. Each one had to be set up for every test. Unit testing of an individual class was meaningless, but it was impossible to separate a feature from the rest of the system.
In 1999, Corel's flagship product was its DRAW! Vector graphics suite. As I remember it, testing was completely manual. Of course, developers were expected to test their changes as much as possible, but CorelDRAW is a huge program with thousands of features and modes of operation. After we'd done some cursory testing, we'd check in a change into Visual Source Safe and wait bugs to come in from beta testers. When a bug was submitted, the test specialist for my team, Mona, would reproduce it and create an issue report.
The system was clearly broken. We had about 100,000 open bugs and even some of the more serious ones (copy text with certain bullet styles results in crash.) had been unfixed for several versions.
The problem was the lack of any regression testing. We relied too much on developers manually going through all the code paths, and beta testers to submit reports. The engines were physically separated into libraries, but they were tightly coupled into the UI, so automated testing was impossible. If you broke something in a little-used feature, it might be years until someone noticed.
Regression testing is only feasible if the process is automated, but to this day, I still can't think of a good way to automatically test GUIs. If your program has a graphical user interface, you are stuck with laborious manual testing. The best thing you can do is to separate your program into two parts: a user interface part, and an engine part that does the real work. The user interface part will have to be tested by moving the mouse and going through all the options. The engine part has to have a well defined API, with inputs and outputs that you can test automatically.
Note: Most people stop reading at this point.
In a modern software company, developers should be running some kind of regression tests with every change they make. If you make it painful, then your developers will be unproductive. If you make it painful and mandatory, then your developers will be unproductive and unhappy. For effective test driven development, you need three things:
If the test system is painful to use and create tests for, then developers will not create tests. There has to be some payoff for spending the effort of creating a test, in terms of finishing and going home early. Otherwise, the tests will be put off to later, or they won't get done, or you will have to have a separate team whos only job it is to create tests, and then you are no longer doing test driven development.
The test system must support the easy creation of tests. You should be able to take a bug report or some log from the field, run it through a tool, and out pops your test that is ready to run to reproduce the issue. If the tests are written in Java that is quite hard to do. Ideally instead of writing code, you will have some other kind of input, like a list of events that occurred since system startup. You can run the events through your system to exactly reproduce its state. These types of tests take no development effort to create, and the best part is they don't depend on function names and classes. You could rewrite your system from scratch, and as long as it takes the same input, your tests will still work.
Finally, the test suite should be fast. You're going to want your automated build system to run the tests after every few changes. If you think about it, a developer might make, on average, one or two changes a day. If you have 100 developers, you will have 100 to 200 changes a day. Developers will need to be able to run the tests at their desk, too. If regression tests passing is mandatory before committing a change, then it should take only a few minutes to run, so you don't have developers leaving for a three hour lunch, checking their stocks, or having swordfights.
But instead of maintaining a real document and displaying on the screen, we have a generic test document. When we call document.addText(), we just record this fact to a text file. So after our importer runs, we might have something like this:
Suppose we had hundreds of .doc files, taken from actual bug reports by actual users. We put them into a folder called "input" and check them into our source control system. We run them through the test document. This only takes a few seconds, because all its doing is creating text files. We then take these output .txt files, and check them into our source control system.
If, one year later, I'm making a change and the output of the test suite changes in any way, then it is either a bug, an improvement, or irrelevant. I'll revise the change to fix the problem, or update the checked-in text files with the new results.
We now have a regression test system that is easy to use and quick to run. The tests are not Java files. They take absolutely no effort to write -- we just save an attachment. In fact, it's easier to fix an issue after creating a test for it, because you can run it over and over again. Developers will naturally create tests as part of doing their job, without even being asked.
Now we can re-write stuff without fear. We can re-write the file importers from scratch, and make sure they work on all the same documents in exactly the same way. We can also re-write the rest of the system, as long as the interface to addText() and setFont() still works the same way.
Sure, there are some bad parts. If you change document.setFont() so that it needs a font encoding parameter, you will need to update all the test scripts. But these changes aren't difficult to manage, and the benefits far outweigh the inconvenience.
If you are setting up a regression test system, it should be effortless to create a new test, and it should be able to run hundreds of tests in five minutes. Most importantly, the test system should make it easier to fix bugs, so developers will naturally want to create new tests. Don't make testing a pain in the ass.
Two places I worked at had test systems that were painful to use. They were Microsoft and Soma.
You can't test GUIs
Three rules for Test Driven Development
File importer example
Lets say we are responsible for writing file importers for a word processor. We are fixing a bug in a file importer for Microsoft DOC format. The input is a .DOC file, and the output is a series of function calls that modify a document. So when we read some text, we call document.addText(), and when we get a new font, we call document.setFont(), etc.
called addText("This document is copyright")
called setFont("Symbol", 10)
called addText("c")
called setFont("Times New Roman", 10 )
called addText("1995")
In Conclusion
We develop web-based software and have a complete set of JUnit systems tests (regression tests, functional tests, call them what you will) written using WebDriver - now part of Selenium. These test the system from the UI down, written by developers, test-first.These tests 'driver' the browser to mimic the actions of users: clicking buttons and links, filling in forms, etc. and assert the presence or absence of elements in the page using XPath.
I love these tests. When I run them the browser opens and I can see the buttons being clicked and the pages navigated. Failures are usually pretty obvious because you can see the state of the page when the failure occurs. This is also a good opportunity to think abut the layout/style of each page
They are pretty simple to write ... the domain language of testing is actually pretty small: click, fill, assert. Even with Javascript UIs that allow drag and drop, etc. it takes a developer less than a day to learn how to write the tests.
You need to put some effort into writing your code to be testable but that was always the case with TDD. But what we've found is that interfaces that are testable tend towards being usable. In a way its obvious: if your tests are very complex they are hard to write and debug so developers are directly encouraged to keep the UI simple to make the tests simple. We also have usability testing - TDD of the UI is not a panacea but its definitely a step in the right direction.
The down side is these tests are slow to run because they mimic the actions of real users. The complete suite takes 90 mins but the tests for a particular feature or page usually run in a couple of minutes so we are just more relaxed about test failures in the build. Our team is small, I don't know that this approach would work with 100 developers.
I've often wondered whether other types of UI could be tested in this way. The advantage of web-based UIs is that they are represented as a specification to a layout engine ... so you can always identify the presence or absence of elements and you're not tied to layout to drive the behaviour (you click button with id "submit" or value "Save" not click at point 988,762). Perhaps if more UIs were built to be testable, TDD of UIs would be more common.
I have successfully used 2 GUI testing libraries (Java Swing) for my 2 open source projects:
- Abbot for HiveBoard (I have almost complete coverage tests, in terms of covered features and covered code). It was a bit hard to set up the first time, but adding tests to it was then a breeze.
- FEST for DesignGridLayout: it was straightforward to set up and thanks to its fluent API it is easy to write new test cases.
There is no reason why you should couple UI code to functional logic. You should have a properly separated system with tests written for the logic and preferably also a test suite that uses the UI (lets say you had a subtract program that took two numbers and returned a result. The sub(a,b){return a-b;} might work fine but if the a and b fields are mislabeled or linked incorrectly to inputs it isn't going to do all that much good.)
At a previous job I tested a few UI testing suites and settled on QAWizard (I was planning to only use it on web apps so I've never tested its capabilities with local applications.) There seemed to be some better features in other tools but a major draw for me was how quickly someone could start making tests. There was also a capability to write tests in a scripting type environment. I moved jobs afterwards and as far as I know the copies we purchased never saw any real use so I can't give you any substantial performance claims though.
Then you need "Kleenex" testers: people who have never seen the program to play with the program. In this case you are not looking for bugs, you are looking for confusion. You have to take their fumblings and mistakes seriously; because you have to minimize these things. Unfortunately "Kleenex" testers cannot be used as "Kleenex" testers twice, hence the name.