:jasonrudolph => :blog

puts Blog.new(”nonsense”)

Testing Anti-Patterns Potpourri - Quotes, Resources, and Collective Wisdom

Posted by Jason Rudolph on 7th October 2008

While working on the Testing Anti-Patterns series over the past few months, I’ve had the pleasure of reading some great writing on testing, test-driven development, code coverage analysis, and the bigger picture of software quality in general. What follows is a collection of some of my favorite findings: quotes and resources spanning the last ten years and then some.

How Not to Test

Let’s start off with something light. James Carr’s TDD Anti-Pattern Catalogue is a good, fun read that’s downright hilarious at times (but only because we remember writing seeing these tests once or twice). And not only are the anti-patterns that James proposes worth reading, you owe it to yourself to check out the extensive discussion in the comments as well. There you’ll find folks chiming in with many additional gems, including:

  • The Blue Moon - Matt Simner’s aptly-named anti-pattern is one we’ve all hit at least once: “A test that’s specifically dependent on the current date, and fails as a result of things like public holidays, leap years, weekends, 5-week months, etc.” Ouch.
  • Honest Guv - Graham Lenton’s contribution to the list brings back some memories I’d rather forget: “Where the expected outcome is so entropic that the developer simply asserts true with a comment ‘this works, honestly.’” Um. Sure it does.

We’re Still Figuring This Stuff Out

Some tests are just so incredibly rotten that it’s easy to definitively declare a better approach, but it’s not always that black and white. Just as we have to make trade-offs when designing and developing production code, so too do we have to weigh the advantages and disadvantages of the available approaches for testing that code.

In his post on Developer Testing and the Importance of Context, Jay Fields reflects on the catalog of testing patterns he’s documented over the past few years, and he reminds us that “context is still king:”

You’ll find that some of [the] approaches are in direct conflict. This isn’t because one pattern is superior to another in isolation, it’s because one pattern is superior to another in context.

Software development certainly isn’t the only engineering discipline that requires its practitioners to manage trade-offs, so how do we compare with other industries? The often controversial (but always entertaining) Neal Ford argues that “testing is the engineering rigor of software development,” but that we’ve still got a long way to go if software development will ever truly be an engineering discipline:

It may just be that software will always resist traditional engineering kinds of analysis. We’ll know in a few thousand years, when we’ve been building software as long as we’ve been building bridges. We’re currently at the level in software where bridge builders were when they built a bridge, ran a heavy cart across it, and it collapsed. “Well, that wasn’t a very good bridge. Let’s try again.”

Pragmatic Use of Code Coverage Analysis

We’ve talked quite a bit about code coverage in our discussion of testing anti-patterns. Brian Marick’s 1997 paper titled How to Misuse Code Coverage is a must-read, full of pragmatic advice from a veteran tester:

I’ve written four coverage tools … I still find myself looking at a coverage condition, saying “I know how to satisfy that,” and getting an almost physical urge to write a quick-and-dirty test that would make the coverage tool happy. It’s only the certain knowledge that customers don’t care if the coverage tool is happy that restrains me.

The most common coverage mistake is giving into that urge. I warn against it by saying that coverage tools don’t give commands (”make that evaluate true”), they give clues (”you made some mistakes somewhere around there”). If you treat their clues as commands, you’ll end up in the fable of the Sorcerer’s Apprentice: causing a disaster because your tools do something very precisely, very enthusiastically, and with inhuman efficiency - but that something is only what you thought you wanted.

Designing your initial test suite to achieve 100% coverage is an even worse idea. It’s a sure way to create a test suite weak at finding those all-important faults of omission. Further, the coverage tool can no longer tell you where your test suite is weak - because it’s uniformly weak in precisely the way that coverage can’t directly detect. Don’t use code coverage in test design. The return on your testing dollar (in terms of bugs found) is too low.

Now that we know how to misuse code coverage, the Google Testing Blog weighs in on the proper application of coverage analysis:

  1. Make your tests as comprehensive as you can, without coverage in mind. This means writing as many test cases as are necessary, not just the minimum set of test cases to achieve maximum coverage.

  2. Check coverage results from your tests. Find code that’s missed in your testing. Also look for unexpected coverage patterns, which usually indicate bugs.

  3. Add additional test cases to address the missed cases you found in step 2.

  4. Repeat step 2-3 until it’s no longer cost effective. If it is too difficult to test some of the corner cases, you may want to consider refactoring to improve testability.

In Code Complete, Steve McConnell offers sage advice for knowing when you can reasonably move to Step 2. (Hint: Just testing the happy path ain’t gonna get you there.):

Immature testing organizations tend to have about five clean tests for every dirty test. Mature testing organizations tend to have five dirty tests for every clean test. This ratio is not reversed by reducing the clean tests; it’s done by creating 25 times as many dirty tests.

And we should always keep the value of code coverage (or any isolated metric for that matter) in perspective, as Andy Glover reminds us in his post on unambiguously analyzing metrics:

Metrics are more copasetic when combined with other metrics and trended – for instance, complexity alone is somewhat interesting, but pairing complexity with code coverage paints a much more detailed metric that bears understanding. High complexity with low coverage is clearly more risky than the same complexity with high code coverage …

Beyond Developer Testing

Not only should we avoid relying on a single metric, we should also be wary of relying too much on a single form (or a single “layer”) of testing. As Luke Francl argues in his call for more diverse testing, even the most exhaustive set of unit tests is limited in the scope of defects it can find [1]:

200810 Testing Layers Venn Diagram

Don’t put all your eggs in one basket. The most interesting thing about these defect detection techniques is that they tend to find different errors. Unit testing finds certain errors; manual testing others; usability testing and code reviews still others.

In his post on The Ultimate Unit Test Failure, Jeff Atwood comes out swingin’ and begs development teams to invest in interaction design with the same enthusiasm (or perhaps more) that we give to developer testing:

Perfectly executed code coverage doesn’t mean users will use your program. Or that it’s even worth using in the first place.

No Silver Bullet

If there’s any one theme that we can observe across this collection of ideas, it’s the recognition that no recipe, or tool, or one-size-fits-all process offers the One True Way (TM) to develop quality software. As Michael Feathers points out, quality doesn’t come from a “mechanistic” approach to testing:

Quality is a function of thought and reflection - precise thought and reflection. That’s the magic. Techniques which reinforce that discipline invariably increase quality.

It’s only that critical thought that will tell us when it would be more cost-effective to use one flavor of testing over another. It’s only that critical thought that will uncover implied requirements in a user story or prompt us to ask about hidden assumptions that might be lingering in a given use case. It’s only that critical thought that will recognize when the pattern that’s worked for us 95% of the time simply isn’t appropriate for the particular scenario we’ve just come across. To forgo that critical thought in search of a silver bullet, that’s the ultimate testing anti-pattern.

Notes

[1] Luke notes that the specific overlaps in the diagram are admittedly arbitrary. The specific areas of overlap are insignificant for our purposes here. Noteworthy is that while there is some overlap between these forms of testing, each one tends to identify at least some defects that are less likely to be identified by other forms of testing. (Diagram used with permission.)


This series is taken from the How To Fail With 100% Test Coverage talk. Check the schedule for a talk near you.

Thanks to Stuart Halloway, Rob Sanheim, and Greg Vaughn for reading drafts of this post.

Tags: , | No Comments »

Audio, Video, Slides: How to Fail With 100% Test Coverage at raleigh.rb

Posted by Jason Rudolph on 9th September 2008

The Testing Anti-Patterns series began as a conference presentation titled How to Fail With 100% Test Coverage, and I recently had the pleasure of presenting that talk at the Raleigh-Area Ruby Brigade (raleigh.rb). Matthew Bass was kind enough to record the audio from the event, and I’ve taken a stab at syncing that audio with the slides for a full-on multimedia extravaganza.§


Download Quicktime video via Vimeo (registration required)

Download MP3 via iTunes

Download MP3 directly

Download slides

Wanna see this talk live? Check the schedule for an event near you.

§ Keynote’s Quicktime recording/exporting has a seemingly diabolical sense of humor. For comedic effect (or perhaps to make sure you’re paying attention), it likes to randomly shuffle the slides every once in a while. So in the video, you’ll notice that unfortunately the slides and the audio don’t always line up just right. If you find it too erratic, then it may be easier to listen to the audio and advance the slides manually. And if you see Tyler Durden make a cameo, well, it was probably just Keynote having a laugh.

Tags: , , , , , | No Comments »

Testing Anti-Patterns: Invisible Code

Posted by Jason Rudolph on 18th August 2008

As we’ve seen over the last several weeks, it’s remarkably easy for code to earn the badge of 100% test coverage without necessarily having a strong test suite. In each of those examples, the coverage analysis tool performed its task flawlessly: it reported exactly which portions of our code were executed as a result of running the tests. The all-green coverage report showed us that the tests indeed touched all of our code, but it was up to us to acknowledge that simply touching a line of code doesn’t mean that you’ve exercised and verified that line of code in a meaningful way. Some folks interpret this acknowledgement to mean that coverage analysis is meaningless, but that unfortunate conclusion overlooks the real benefit of a coverage report: it’s not about getting to 100% test coverage and assuming victory, it’s about highlighting any areas of our codebase that we’ve forgotten to test entirely.

As with any tool, to make effective use of coverage analysis, we need to understand its purpose, its capabilities, and its limitations. In all of the previous examples, we looked at code that was already in the coverage report. In other words, the coverage tool knew about this code and was able to watch the code and assess its coverage upon completion of the test suite. But if we’re using the coverage report to help us find untested code, how do we deal with code that the coverage tool might not be aware of in the first place?

Let’s start with a sample Rails app that represents the beginnings of an online store. The project currently contains the following files (as well as some others that I’ve omitted for the sake of brevity).

[store]$ tree
...
|-- app
|   |-- controllers
|   |   |-- application.rb
|   |   `-- products_controller.rb
|   |-- helpers
|   |   |-- application_helper.rb
|   |   `-- products_helper.rb
|   |-- models
|   |   `-- product.rb
|   `-- views
|       |-- layouts
|       |   `-- products.html.erb
|       `-- products
|           |-- edit.html.erb
|           |-- index.html.erb
|           |-- new.html.erb
|           `-- show.html.erb
|-- config
|   |-- boot.rb
|   |-- database.yml
|   |-- environment.rb
|   |-- environments
|   |   ...
|   |-- initializers
|   |   |-- inflections.rb
|   |   |-- mime_types.rb
|   |   `-- new_rails_defaults.rb
|   `-- routes.rb
|-- db
|   |-- migrate
|   |   `-- 20080810220638_create_products.rb
|   `-- schema.rb
...
|-- lib
|   |-- product_ftp_importer.rb
|   `-- tasks
|   |   |-- data_load.rake
...
|-- test
|   |-- fixtures
|   |   `-- products.yml
|   |-- functional
|   |   `-- products_controller_test.rb
|   |-- integration
|   |-- test_helper.rb
|   `-- unit
|       `-- product_test.rb
...

37 directories, 63 files

After installing the rails_rcov plugin, we can easily produce a coverage report to see where we currently stand.

100% Test Coverage

According to the coverage report, we’re not aware of any code that isn’t touched by at least one test. But is that really the whole story? The number of test-related files sure accounts for a small proportion of the overall app. We can see that we have test/unit/product_test.rb and test/functional/products_controller_test.rb, but do those two files really encompass all the developer testing needed for this application?

Out of Sight, Out of Mind?

What about that mysterious file hanging out in the lib directory?

[store]$ tree
...
|-- lib
|   |-- product_ftp_importer.rb
...

Just judging by the name of the file, that sure sounds like functionality that deserves testing, but for some reason it’s not listed in the coverage report. And if we take another look at the test files listed above, there are no test files that obviously correlate to product_ftp_importer.rb. So, how is it that we have 100% coverage?

In order for code to show up in a coverage report, we need to instruct the coverage tool to assess that file. The approach for doing so tends to vary from tool to tool. With rcov, we first have to tell it which files constitute our test suite (i.e., the files we want rcov to run). But that alone is not sufficient; we also have to ensure that application files (i.e., the files whose test coverage we want to measure) get loaded as part of the test suite. Adding this require statement anywhere in our test suite is enough to shed some light on the elusive code in product_ftp_importer.rb.

  1. require File.expand_path(File.dirname(__FILE__) + "/../lib/product_ftp_importer")


71.7% Test Coverage

It’s hard to feel good about 45 lines of untested FTP-processing voodoo, so how can we unearth this invisible code as soon as it tries to sneak its way into our app?

  1. Ensure that your coverage task knows where to find your tests, always place new tests where the coverage task can find them, and use a consistent naming scheme so that a simple file-matching pattern can distinguish test files from non-test files.
  2. If you’re using rcov, add a quick script that will crawl your project tree and require all application files. Adding individual require statements one-by-one is not a reasonable solution. If someone’s not going to write the test in the first place, they’re sure as heck not gonna take the time to tell the coverage report about that misdeed. So, walk the tree and require any Ruby file that you encounter in places where application code is likely to turn up. At the very least, in a Rails app you should include all subdirectories under app (being aggressive enough to catch any new directories that might get added there) and the lib directory.

UPDATE (2008-08-21) At the “How to Fail with 100% Test Coverage” talk earlier this week, a few folks asked if I’d provide an example script that would perform this task in a Rails app. Adding this line to test_helper.rb should get you started.

  1. Dir["app/**/*.rb", "lib/**/*.rb"].each { |f| require File.expand_path(f) }


(Not So) Scenic View Ahead

View templates have long been a favorite dumping ground for misplaced application logic. This problem can often go undetected, because view templates fly under the radar of the coverage report. Most developers know they should minimize the application logic included in the view, but when a deadline’s looming, the lure of throwing some code in the view “just this once” is often hard to resist. For example, what’s so wrong with having the view decide whether to display a particular product in the list?

  1. <% for product in @products %>
  2.   <% if product.quantity_in_stock > 0 && product.quantity_in_stock > product.pending_backorder_count %>
  3.     <!– display purchasable product here –>
  4.     <!– … –>
  5.   <% end %>
  6. <% end %>


Well, how exactly will we verify that the view is indeed displaying the right products and suppressing the others? We could manually test each scenario by visually inspecting the resulting UI, and that might be good enough for us to have confidence that the app is doing the right thing as of this moment. But code has a life of its own, and it will grow and change over time, and we want automated tests to make sure that this page continues to display the correct data even after those inevitable changes.

So we decide to write tests to verify that we’re displaying only the right products. And since this logic is inside our view template, we need to write tests that will render our view template and then dissect the resulting HTML to verify that it contains the products that should be present and that it does not contain the products that should not be present. But in order to render the HTML, we need to invoke some controller action. And because that lone if statement needs at least four different test cases to check the various conditions, we get the joy of doing all that setup and dissection at least four times. That’s a big enough pain that it quickly becomes very tempting to let this bit of logic remain untested, remain out of sight of the coverage report, and remain “good enough.”

We can do better than that. When something’s too hard to test, we should refactor it until it’s easy to test. [1]

We’re going to need this logic outside of the view anyway, so the sooner we get it into the model the better. Sure, the view will only display those products that are available for purchase, but we need that logic for server-side validation as well. Before we process an order, we need to make sure that we still have the product in stock. If we leave the logic in the view as is, then we’ll be forced to duplicate that logic elsewhere inside our order-processing code. There’s clearly no justification at all for leaving this logic in the view.

  1. <% for product in @products %>
  2.   <% if product.available_for_purchase? %>
  3.     <!– display purchasable product here –>
  4.     <!– … –>
  5.   <% end %>
  6. <% end %>


When we encapsulate this logic in the Product class itself, we can test that logic in isolation, without any dependencies on controllers, and without the need for fragile HTML-parsing to verify the result. Once we perform this refactoring, unit testing the #available_for_purchase? method becomes trivial, and we can refer to that method wherever necessary without unnecessary duplication.

Better still, if we know that we only want to display the products that are available for purchase, we can ensure that our controller provides only those products to the view in the first place. With this approach, our view then enjoys the pleasant simplicity of just displaying the list of products.

  1. <% for product in @products %>
  2.   <!– display purchasable product here –>
  3.   <!– … –>
  4. <% end %>


The coverage report isn’t going to alert us to business logic lurking in our view templates. It’s up to us to keep our views from becoming too smart for their own good, and it’s up to peer code reviews to keep us honest.

Raking for Buried Treasure

While it’s tempting to let our views acquire too much business logic, there’s usually an obvious place to move that logic once we realize the error of our ways. (In Rails, you’ll typically relocate that logic to a model class or to a helper, either of which are easily tested in isolation.) But what about the other parts of our application where untested code tends to hide out and germinate?

Perhaps we have some code that only needs to run at application start-up. In Rails, we’re talking about code in environment.rb or config/initializers. In Grails, BootStrap.groovy is home to this logic. In either case (or in most any other framework), we’re not likely to see those start-up “scripts” included in the coverage report, nor is there a natural and obvious place for testing any complex code that we may need to include in the start-up process. We’re used to testing models and controllers and helpers and mailers, but where does this start-up logic fit into the mix?

Data migration suffers from a similar problem. Rails migrations are great for creating and dropping tables, adding and removing columns, etc., but sometimes we need to do more than just alter the schema; sometimes we want to push data around as well. Schema transformations are essentially declarative code, and really don’t warrant anything beyond visual verification of the results. But when it comes time to migrate 10 million records from some legacy database into our hip new application, chances are we’re not just talking about simple declarations anymore. What’s the worst that could happen though? This code only has to run once. And who wants to write a bunch of tests for code that we’re only gonna run once and then throw away? And once again, there’s no obvious place for us to add tests for this kind of data conversion functionality in the first place. Surely a simple Rake task will suffice.

  1. namespace :db do
  2.   namespace :load do
  3.     desc ‘Load products from csv’
  4.     task :products do
  5.       require ‘csv’
  6.       require ‘environment’
  7.       CSV.open("#{RAILS_ROOT}/db/input/csv/product-catalog/products.csv", ‘r’).each_with_index do |row, idx|
  8.         next if row[0] == "Product"
  9.         p = Product.find_or_create_by_name(row[0])
  10.         p.description = e.purpose = row[5]
  11.         p.sku = row[3]
  12.         p.price = row[4]
  13.         p.save
  14.         shipping_options = row[1].split("|")
  15.         shipping_options.each do |o|
  16.           p.shipping_options << ShippingOption.find_by_name(o)
  17.         end
  18.        
  19.         vendors = row[2].split("|")
  20.         vendors.each do |v|
  21.           p.vendors << Vendor.find_by_number(v) unless v.downcase == ‘none’
  22.         end
  23.       end
  24.     end
  25.   end
  26. end


Indeed, a simple Rake task will suffice, but that’s certainly not what we’re looking at above. While we could write tests for this logic in its current state, doing so is unnecessarily difficult. We’d be restricted to solely black box tests. To test each individual decision point, we’re forced to also construct a new file holding the appropriate dataset, run the Rake task, and then inspect the state of the data in the database. For every decision point.

Scripts Can Be Classy Too

We shouldn’t have to also test the ability to read a file (i.e., line 7) just so that we can test the ability to populate a vendor based on a given vendor number (i.e., line 21). For sure, we want one good end-to-end test to verify that all the cogs are working together correctly. But if that’s our sole testing strategy, then we’ve made testing just painful enough that it probably won’t happen at all.

Whether we’re talking about hard-to-test code in start-up scripts, hard-to-test code in migration scripts, or hard-to-test code hiding out in the handful of other custom scripts that an application tends to accumulate over time, the answer’s the same in each case. Just because the coverage report doesn’t see this hidden code doesn’t mean that it’s not worth testing. And just because our framework-of-choice might not provide a convention for testing this logic, that doesn’t mean that we should just punt.

When something’s too hard to test, we should refactor it until it’s easy to test.

  1. namespace :db do
  2.   namespace :load do
  3.     desc ‘Load products from csv’
  4.     task :products do
  5.       require ‘environment’
  6.       importer = ProductCsvImporter.new("#{RAILS_ROOT}/db/input/csv/product-catalog/products.csv")
  7.       importer.run
  8.     end
  9.   end
  10. end


In the case of this Rake task, and in each of the cases discussed above, by simply moving the logic out of the script and into a proper class (or module), the testing strategy goes from clumsy at best to downright obvious. We no longer need to invoke the whole script in order to verify the particular unit of functionality that we want to test. Instead, we test that functionality in isolation, and allow the script to resume its trivial role of merely calling our well-tested class.

Use It Wisely

In order to make effective use of coverage analysis, it’s important for us to understand what a coverage report is telling us and what it’s incapable of telling us. Tools are imperfect, but we can adopt strategies to make sure we’re reaping the maximum benefit from the tools we choose to employ. With good naming conventions and an agreed-upon application structure, we can easily configure an intelligent solution that allows the coverage tool to automatically pick up any new source files that we want included in the report. With a commitment to testing all application logic - regardless of whether it’s needed in a model, a view, a script, etc. - we’ll extract the code that would otherwise be buried in a dark corner of our app. We’ll benefit from the ability to test it in isolation, and we’ll allow the coverage tool to assess that code, giving us a more realistic and complete view of our codebase.

Invisible code is hidden technical debt, but the sooner you expose it, the sooner you can start to pay it down.

Notes

[1] In past posts in this series, I’ve advocated test-driven development (TDD) as means for combatting the various testing anti-patterns. Invisible code is no exception. While this post is geared more toward uncovering invisible code so that we can give it the testing it deserves, developing test-first is the best bet for preventing invisible code in the first place.


This series is taken from the How To Fail With 100% Test Coverage talk. Check the schedule for a talk near you.

Thanks to Muness Alrubaie, Justin Gehtland, and Greg Vaughn for reading drafts of this post.

Tags: , , , | No Comments »

Testing Anti-Patterns: The Ugly Mirror

Posted by Jason Rudolph on 30th July 2008

When you’re able to write a test, nay, a spec, that not only verifies your code’s functionality, but also clearly communicates its intent, you’ve got a real win on your hands. It’s a win when you’re first writing out your test cases as you’re TDD’ing your way to the solution. It’s a win to the person providing the peer review of your code. It’s a win for you (again) when you have to revisit (and relearn) the code six weeks from now. And it’s a win for the guy that has to touch this code yet again when you’re long gone. [1] But when you occasionally find yourself staring at a spec that looks exactly like the code under test, there’s surprisingly little win left to enjoy.

Consider, for example, the following Ruby Struct tasked with building a friendly string representation of a user’s name and e-mail address.

  1. User = Struct.new(:first_name, :last_name, :email) do
  2.   def to_s
  3.     "#{last_name}, #{first_name} <#{email}>"
  4.   end
  5. end


I ran across code similar to this example during a recent code review, and when I got to the corresponding test …

  1. require "test/unit"
  2.  
  3. class UserTest < Test::Unit::TestCase
  4.   def test_to_s_includes_name_and_email
  5.     user = User.new("John", "Smith", "jsmith@example.com")
  6.     assert_equal "#{user.last_name}, #{user.first_name} <#{user.email}>", user.to_s
  7.   end  
  8. end


… I couldn’t help but feel like I was staring right back at the very code that was supposedly being tested.

Ahhhhhhhh! [2]

This ugly mirror of the production code leaves much to be desired. Sure, thanks to the name of the test, we get the general idea that to_s will output some combination of the user’s name and e-mail address. But when we look at the assertion on line 6, we’re forced to mentally reverse engineer the code in order to see through to the underlying requirement. When it comes to quickly and clearly communicating the intent of the underlying code, this test falls far short of its potential. As it’s implemented above, we’re probably better off reviewing the production code itself than bothering to look at the test at all.

Of course, it doesn’t have to be this way, but it’s not unusual to stumble across this kind of test. In fact, it’s remarkably common to see this kind of test when the test is written after the production code is implemented. It’s as if once a developer has written the code to perform a task, the guts of the code can’t be unseen, and as a consequence, the tests often end up reflecting those inner workings instead of the desired end result.

No Mental Juggling Necessary

On the other hand, when we’re developing test-first, we start out with the end-user requirements in mind, and it’s easy to make sure our tests communicate those requirements.

  1. require "test/unit"
  2.  
  3. class UserTest < Test::Unit::TestCase
  4.   def test_to_s_includes_name_and_email
  5.     user = User.new("John", "Smith", "jsmith@example.com")
  6.     assert_equal "Smith, John <jsmith@example.com>", user.to_s
  7.   end  
  8. end


When we improve the test to focus on the end result, we can look at the test and instantly see the requirements (and so can all the people that will live with our code long into the future). Instead of reflecting the ingredients, the test now reflects the end product. And this allows us to have greater confidence that our code is doing the right thing as well.

Whether it’s strings, dates, timestamps, or numeric calculations, any time we can assert on a literal value, our test will be better off because of it. [3] The less logic that’s in our assertion, the fewer chances we have for that logic to be wrong, and the less logic we have to dig through to grok what’s being tested in the first place.

Notes

[1] And if you’re writing a library or framework, the tests may be a win for the users (or potential users) of your code as well. My colleague Muness Alrubaie is often seen skipping right over the docs and heading straight for the tests when he wants to check out some new open source code. (After all, assuming they pass, the tests don’t lie.) And if there are no tests, or if the tests fail to cleanly express the underlying functionality, you can rest assured that he won’t be looking at that project for long.

[2] Image courtesy of Pablo Baslini (flickr.com/98621082@N00)


This series is taken from the How To Fail With 100% Test Coverage talk. Check the schedule for a talk near you.

Thanks to Greg Vaughn for reading drafts of this post.

Tags: , | 1 Comment »

Testing Anti-Patterns: Underspecification

Posted by Jason Rudolph on 8th July 2008

Last week we discussed the perils of overspecification, and while we saw that it’s clearly possible for a test suite to do too much, it’s far more common for it to do too little.

Green Architecture

Suppose we’re building an application for an online retailer, and they decide that they want to provide free shipping on all orders with a minimum price of $25.00. (Where do they come up with this stuff?!) Armed with these requirements, we set off to develop the logic to determine whether a given order qualifies for this new offer. (We’ll pick on some half-baked Ruby code for this example, but we could certainly extrapolate the same ideas to any language we might choose.)

It’s a straightforward request, so we’ll crank out a quick method to encapsulate the logic …

  1. MIN_FREE_SHIPPING_PRICE = 25.0  
  2.  
  3. def free_shipping?(total_order_price)
  4.   total_order_price > MIN_FREE_SHIPPING_PRICE
  5. end


… and add a few simple tests to make sure we’re getting the desired results (all the while being careful to avoid any overspecification, of course).

  1. def test_free_shipping_returns_true_for_order_above_min_price
  2.   assert free_shipping?(MIN_FREE_SHIPPING_PRICE + 1)
  3. end
  4.  
  5. def test_free_shipping_returns_false_for_order_below_min_price
  6.   assert !free_shipping?(MIN_FREE_SHIPPING_PRICE - 1)
  7. end


And just like that, we have a passing test suite …

  1. $ ruby underspecification_test.rb
  2. Loaded suite underspecification_test
  3. Started
  4. ..
  5. Finished in 0.00029 seconds.
  6.  
  7. 2 tests, 2 assertions, 0 failures, 0 errors


… and 100% code coverage …

100% Test Coverage

… so there’s certainly the temptation to declare victory.

Mission (Not Exactly) Accomplished

But what about the customer that just spent the last 20 minutes concocting the perfect order? You know, the one that’s exactly $25.00 and not a penny more. (Sure, he needs professional help, but do you really want to get on the wrong side of someone with that kind of determination and time on their hands?) If we’re looking for our test suite to tell us how the application should respond to a $25.00 order, we’re out of luck. This underspecification means that our test suite not only fails to communicate how the application should behave in this scenario, our test suite also fails to give us any confidence that the application will do the right thing (whatever that may be).

Edge Cases Matter

Since we’re calling it a “minimum” price, it sure sounds like a $25.00 order should qualify for free shipping, so we should add a test to specify that behavior.

  1. def test_free_shipping_returns_true_for_order_equal_to_min_price
  2.   assert free_shipping?(MIN_FREE_SHIPPING_PRICE)
  3. end


And when we run this test, we should get some good insight into whether we’re likely to have that rather obsessive customer hunting us down in his copious free time.

  1. $ ruby underspecification_test.rb
  2. Loaded suite underspecification_test
  3. Started
  4. ..F
  5. Finished in 0.007163 seconds.
  6.  
  7.   1) Failure:
  8. test_free_shipping_returns_true_for_order_equal_to_min_price(UnderspecificationTest) [underspecification_test.rb:16]:
  9. <false> is not true.
  10.  
  11. 3 tests, 3 assertions, 1 failures, 0 errors


Ouch! As is often the case, the code fails on a boundary condition. Of course, the problem (once identified) is easily fixed by replacing the incorrect operator (>) with the correct one (>=). It’s a trivial change, but under what circumstances would we identify this problem in the first place? Why not call it a day as soon as the first two tests are passing? We already had 100% test coverage [1], and we certainly don’t get any extra credit from our coverage analysis tool for adding this new test case. Geoffrey Wiseman sums up this situation nicely in his commentary on making effective use of coverage analysis:

Coverage [reports] are great at telling you when you don’t have enough tests. They’re terrible at telling you that you have enough, that they’re good enough, etc.

The standard test-driven development approach serves us well: write a test, watch it fail, and then write the code to make it pass. But there’s a next step as well: we need to ask which tests are still missing. We need to apply that critical thought. As soon as we finish writing the first two tests above, we should immediately start looking for the “abusive” tests (as Eric Sink lovingly refers to them) that we can write to make our code fail. Where are the edge cases? Where are the exception cases? What should happen if we pass in a negative value? A nil value? Our test suite should be specific in its demands of our code.

Use It Wisely

A green test suite and high code coverage only means that we’ve satisfied the functionality that’s currently specified by our tests. We can (and should) rely on the coverage report to spot the code that isn’t yet exercised by our test suite, but we can’t stop there.

In Waltzing with Bears, Tom DeMarco and Timothy Lister remind us that “while it’s possible to specify a product ambiguously, it is not possible to build a product ambiguously.” The applications we build are always doing something in the corner cases. In the exception cases. But are they doing the right thing. And without proper tests (i.e., executable specifications), how can we be sure?

Notes

[1] If you recall from our recent discussion on code coverage types, this is the part where you ask, “Exactly what kind of coverage are we talking about here?” Well, despite the fact that rcov only reports line coverage analysis, we can still deduce fairly quickly that this particular code has 100% line coverage, 100% branch coverage, and 100% path coverage.


This series is taken from the How To Fail With 100% Test Coverage talk. Check the schedule for a talk near you.

Thanks to Aaron Bedra and Greg Vaughn for reading drafts of this post.

Tags: , , | 2 Comments »

Testing Anti-Patterns: Overspecification

Posted by Jason Rudolph on 1st July 2008

Tests increasingly serve multiple roles in today’s projects. They help us design APIs through test-driven development. They provide confidence that new changes aren’t breaking existing functionality. They offer an executable specification of the application. But can we ever get to a point where we have too much testing?

Enough is Enough

Consider the following test that you might come across in an application with a less-than-ideal test suite. (We’ll pick on some badly-written Rails code for this example, but the ideas we’ll discuss are certainly not unique to just Rails or Ruby or even just to dynamic languages. Unfortunately, these problems are quite universal.)

  1. require File.dirname(__FILE__) + ‘/../test_helper’
  2.  
  3. class ProductsControllerTest < ActionController::TestCase
  4.   def test_something
  5.     product = Product.create(:name => "Frisbee", :price => 5.00)
  6.     get :show, :id => product.id
  7.     assert_response :success
  8.     product = assigns(:product)
  9.     assert_not_nil product
  10.     assert product.valid?
  11.     assert product.name == "Frisbee"
  12.     assert product.price == 5.00
  13.   end
  14. end


Whoa! That sure is a lot going on in a single test case. What exactly is it that we’re trying to test here? Let’s take a step back and see if we can figure it out. [1]

  • Line 5 - We start off by creating a new product. We give the product a name and a price and call #create. Since we’re in a Rails app, without looking elsewhere in the code, we can make an educated guess that Product is an ActiveRecord subclass and that calling #create will persist the product to the database.
  • Line 6 - We send an HTTP GET request to a #show method and pass along the ID of the product we just created. Since we’re in ProductsControllerTest, we can safely infer that we’re calling the #show action in the ProductsController class.
  • Line 7 - We assert that the controller action responded with HTTP status code 200 (i.e., success).
  • Lines 8-9 - We look for an object stored in the assigns hash (i.e., the hash of variables that will be available to the view template) with a key of :product and we assert that it’s not nil.
  • Line 10 - We verify that the product object satisfies all the product validation rules (though we can’t know what those rules are without taking a look at the Product class).
  • Lines 11-12 - We assert that the attribute values we provided when creating the product (in line 5) match the attribute values stored in the object we fetched from the assigns hash.

For a class called ProductsControllerTest, it sure feels like we’re doing a fair bit more than just testing the code we’d expect to find in ProductsController (assuming ProductsController conforms to the traditional responsibilities of a controller). Let’s take a look at the controller code to better understand exactly what it is that we’re testing.

  1. class ProductsController < ApplicationController
  2.   def show
  3.     @product = Product.find(params[:id])
  4.   end
  5. end


It turns out that our controller code is notably simpler than our test case would have led us to believe. The #show action does indeed stick to the expected behavior of a controller, and in this case, it’s able to provide that behavior in a single line of code (despite the 8 lines of test code currently employed above). That single line satisfies all of the expectations of the #show action: look up the product with the given ID and place the product object in an instance variable for use by the view.

If the #show action isn’t responsible for knowing how to successfully save a new product record into the database, why does our test case attempt to provide the data for creating a new product? What will happen to our test case when someone adds a new validation rule requiring that all products also include a quantity attribute? Unfortunately, the test as it’s currently written will break. And while a failing test is often a welcome warning that our changes have inadvertently broken a part of our application, it’s far from desirable for a model-level rule related to creating new records to cause a failure in a controller test related simply to displaying existing records. In short, we’ve given our test case too much responsibility, and in doing so, we’ve made it fragile.

We can see from the #show action above that not only is the controller not responsible for knowing how to create a valid product, it’s also not responsible for ensuring that a product’s attributes are properly populated when the record is read from the database. However, if we take another look at the last few lines of the test case, we might think otherwise. That brings us to the second problem with this particular test: it’s a rotten source of documentation for the code being tested. As we increasingly move toward tests as specifications of our application’s behavior, it’s vital that those specifications clearly communicate the expected behavior. As it’s currently implemented, the overspecification in this test case leaves the reader having to do way too much work to figure out the true expectations of the code being tested.

Communicate Essence

Let’s take another pass at writing a test for the #show action, this time with an eye toward removing the fragility of the previous test case and increasing the value of the test as a specification.

  1. require File.dirname(__FILE__) + ‘/../test_helper’
  2.  
  3. class ProductsControllerTest < ActionController::TestCase
  4.   def test_should_show_product
  5.     product = create_product
  6.     get :show, :id => product.id
  7.     assert_response :success
  8.     assert_equal product, assigns(:product)
  9.   end
  10. end


In this implementation, we’ve abstracted away the logic for creating a new product in line 5. We’ve defined a helper method for use by any and all tests in our application that have a need to create a new product. If and when the rules for successfully creating a new product change, we’ll update the #create_product method, and we won’t have to touch the code in ProductsController or ProductsControllerTest at all. [2]

As for the four assertions that appeared at the end of our first attempt at this test case, we’ve replaced those assertions with a single (stronger) assertion. Where we previously asserted that the assigns hash held a non-nil product, that the product was valid, and that its attributes matched the attributes used at the beginning of the test, we now simply verify that the product object in the assigns hash is equal to the product object that we created at the beginning of the test. That single line more accurately and more concisely expresses our expectation: that the product whose ID we provide in the request to the #show action is the same object that’s made available to the view.

By focusing our test case on the essence of the code under test, we’ve ended up with less test code to maintain. And with the extraneous setup and assertions removed, the remaining test code provides a clearer and less fragile specification of the behavior being tested. [3]

May I Take Your Order, Please?

In the previous example, we might have detected the overspecification by the significant mismatch between the lines of test code and the lines of production code, or seeing model-specific assertions in a controller-specific test may have caught our eye. But, overspecification comes in more subtle forms as well. Consider the following method provided by ActiveRecord::Base for fetching the list of columns that hold domain-specific content from a model class in Rails.

  1. # Returns an array of column objects where the primary id, all columns ending in "_id" or "_count",
  2. # and columns used for single table inheritance have been removed.
  3. def content_columns
  4.   @content_columns ||= columns.reject { |c| c.primary || c.name =~ /(_id|_count)$/ || c.name == inheritance_column }
  5. end


To test this method, we’ll need a sample ActiveRecord model class. Let’s use the Topic model, which is backed by the following table.

  1. create_table :topics, :force => true do |t|
  2.   t.string   :title
  3.   t.string   :author_name
  4.   t.string   :author_email_address
  5.   t.datetime :written_on
  6.   t.time     :bonus_time
  7.   t.date     :last_read
  8.   t.text     :content
  9.   t.boolean  :approved, :default => true
  10.   t.integer  :replies_count, :default => 0
  11.   t.integer  :parent_id
  12.   t.string   :type
  13. end


In our first attempt at testing the #content_columns method, we might come up something similar to this:

  1. # (Naive) First Attempt
  2. def test_content_columns
  3.   content_columns      = Topic.content_columns
  4.   content_column_names = content_columns.map {|column| column.name}
  5.   assert_equal %w(title author_name author_email_address written_on bonus_time last_read content approved), content_column_names
  6. end


Can you spot the overspecification? To give you a hint, compare that test to the actual test employed in the ActiveRecord test suite:

  1. # The Real Deal
  2. def test_content_columns
  3.   content_columns        = Topic.content_columns
  4.   content_column_names   = content_columns.map {|column| column.name}
  5.   assert_equal 8, content_columns.length
  6.   assert_equal %w(title author_name author_email_address written_on bonus_time last_read content approved).sort, content_column_names.sort
  7. end


If you’ve ever written a test for something that returns an ordered list of items where the actual order either doesn’t matter or isn’t guaranteed, you’ve probably been bitten by this breed of overspecification, and there’s a good chance your solution matched the solution used in the ActiveRecord test code above. Because the #content_columns method doesn’t guarantee that the columns will be returned in any particular order, it’s inappropriate (and fragile) for our test to specify a certain order.

While the actual ActiveRecord solution above works, it too could be better. First, since we know that the two arrays in line 6 won’t be equal unless they have the same length, we can safely remove the extraneous length assertion in line 5. Second, when we see the calls to #sort on line 6, we’re forced to pause (if even for just a moment) to think about why it might be important to sort the two arrays. We don’t really want to sort the arrays; we’re just using that technique to get around the fact that order doesn’t matter to us, but it does matter when Ruby compares two arrays for equality. Since we really only want to assert that the arrays have the same elements, why not do exactly that?

  1. # Don’t Make Me Think
  2. def test_content_columns
  3.   content_columns        = Topic.content_columns
  4.   content_column_names   = content_columns.map {|column| column.name}
  5.   assert_same_elements %w(title author_name author_email_address written_on bonus_time last_read content approved), content_column_names
  6. end


By using an assertion like Shoulda’s assert_same_elements method, our test can clearly and concisely express the expected behavior.

Use It Wisely

Overspecification comes in many flavors, and the examples above in no way represent a comprehensive list. As developers, we frequently strive to write as little code as possible to accomplish the task at hand. When it comes to writing tests, we should very much keep that goal in mind as well. Good tests communicate the essence of the scenario being tested and nothing more. While I doubt a project will ever suffer from too much testing, it can certainly suffer from tests that specify too much.

Notes

[1] If this test seems absurd to you, good! Your ability to detect this type of overspecification will serve you well.

[2] The actual implementation of the #create_product method isn’t particularly pertinent in this discussion. (In the Rails space, there’s certainly no shortage of options.)

[3] With all the excessive thoroughness that went into overspecifying the the current behavior of the #show method, is it possible that we’ve been too distracted to identify other functionality that’s still missing from that method? In the next post, we’ll take a look at overspecification’s more lethargic cousin: underspecification.


This series is taken from the How To Fail With 100% Test Coverage talk. Check the schedule for a talk near you.

Thanks to Justin Gehtland, Muness Alrubaie, and Glenn Vanderburg for reading drafts of this post.

Tags: , | 2 Comments »

Testing Anti-Patterns: Incidental Coverage

Posted by Jason Rudolph on 17th June 2008

So you’ve taken your project to 100% code coverage, you’ve configured your continuous integration system to fail the build if that coverage ever drops below 100%, and you’re ready to enjoy the fearless refactoring and the rock solid regression testing suite that your software engineering rigor has now earned you. But are you really covered? What does 100% code coverage mean in your project? Is it enough to know that your test suite encounters every line of code? Or don’t you want to be sure that it exercises every line? If you simply encounter the line without asserting that it produces the correct results, are you any better off? [1]

Failure to Assert

Just achieving 100% code coverage is the easy part. Making it mean something: that’s where the real value kicks in. Consider the ease with which we can get to 100% line coverage on the following code (generated using Rails 2.1 scaffolding).

  1. class ProductsController < ApplicationController
  2.   # GET /products
  3.   # GET /products.xml
  4.   def index
  5.     @products = Product.find(:all)
  6.  
  7.     respond_to do |format|
  8.       format.html # index.html.erb
  9.       format.xml  { render :xml => @products }
  10.     end
  11.   end
  12.  
  13.   # … remaining methods omitted
  14. end


In order to ensure that the #index method is performing all its proper duties, we’ll define the following “test case.”

  1. require File.dirname(__FILE__) + ‘/../test_helper’
  2.  
  3. class ProductsControllerTest < ActionController::TestCase
  4.   def test_should_get_index
  5.     get :index
  6.   end
  7.  
  8.   # … remaining tests omitted
  9. end


We’ll use rcov to assess the results.

Example 1

And just like that, we have 100% code coverage for the #index method. [2] In this case though, that clearly means nothing more than the fact that we encountered 100% of the lines in the method. When code coverage is that easily achieved, it hardly seems cause for celebration. To get any real value from the test, we need to actually assert that we’re getting the expected results. Until we do so, we have nothing more than incidental coverage. [3]

The test code provided by the Rails scaffolding gets us closer to where we want to be.

  1. require File.dirname(__FILE__) + ‘/../test_helper’
  2.  
  3. class ProductsControllerTest < ActionController::TestCase
  4.   def test_should_get_index
  5.     get :index
  6.     assert_response :success
  7.     assert_not_nil assigns(:products)
  8.   end
  9.  
  10.   # … remaining tests omitted
  11. end


The tests pass, and our code coverage is still at 100%. At this point we’ve significantly increased the value of that particular test. No longer does the code have to crash spectacularly in order to yield a test error. Instead, exiting the method with anything other than a success response code will result in a test failure. Similarly, failure to set the @products instance variable will cause the test case to flunk. [4]

100% Covered. 50% Tested.

But while we’ve improved on the initial test case, at best we’re really only halfway to where we want to be. What would our test suite tell us if we were to alter line 9 as follows.

  1. class ProductsController < ApplicationController
  2.   # GET /products
  3.   # GET /products.xml
  4.   def index
  5.     @products = Product.find(:all)
  6.  
  7.     respond_to do |format|
  8.       format.html # index.html.erb
  9.       format.xml  { raise :fatal_error }
  10.     end
  11.   end
  12.  
  13.   # … remaining methods omitted
  14. end


  1. $ ruby products_controller_test.rb
  2. Loaded suite products_controller_test
  3. Started
  4. ……..
  5. Finished in 0.17716 seconds.
  6.  
  7. 8 tests, 15 assertions, 0 failures, 0 errors


Unfortunately, our test suite still blindly gives a thumbs up, despite the fact that any attempt to access the XML-formatted output would yield an exception. While our test suite includes assertions for how the application should prepare an HTML-bound response, our coverage of the XML-specific logic in line 9 remains merely incidental.

If we want our automated test suite to ensure that the #index method remains capable of producing an XML-formatted response as our codebase changes over time, we’d need to add a test case to exercise that code.

  1. class ProductsControllerTest < ActionController::TestCase
  2.   def test_should_get_index_formatted_for_html
  3.     get :index
  4.     assert_response :success
  5.     assert_not_nil assigns(:products)
  6.   end
  7.  
  8.   def test_should_get_index_formatted_for_xml
  9.     @request.env[‘HTTP_ACCEPT’] = ‘application/xml’
  10.     get :index
  11.     assert_response :success
  12.     assert_not_nil assigns(:products)
  13.   end
  14.  
  15.   # … remaining tests omitted
  16. end


rcov doesn’t reward us with any extra credit for writing this test case, but achieving 100% coverage is not the primary goal of a good test suite. [5] The primary goal is to ensure that the code satisfies the requirements. If the application really is required to provide an XML-formatted list of products, then we should seriously consider defining a test for that functionality in our test suite. [6]

Size Matters

In the examples above, we started off with a 5-line method that had 100% line coverage but lacked any assertions. That scenario provides some insight into other places where we might find a high percentage of incidental coverage. While long methods are a bad idea in general, the simple act of invoking a method may be enough to register it as having 100% coverage. If our test suite results in the execution of a 25-line method with no branches, all 25 lines are considered to be covered, regardless of whether we perform any assertions to verify the results of those 25 lines. Even if we refactor the code into smaller methods, if we don’t unit test those new methods, we’re still left with nothing more than incidental coverage.

Use It Wisely

If your goal is to achieve 100% coverage, then incidental coverage is your friend, but your test suite will fall far short of its full potential. You’ll run the risk of acquiring a false sense of security, and the ability to safely and fearlessly refactor is out the window. Changes or enhancements to your application will occur without the safety net of a full regression suite.

If your goal is instead to develop a comprehensive test suite to A) validate your application’s functionality and B) ensure that you build just enough software, then test-driven development (TDD) is your friend (as are peer reviews and/or pair programming). And subsequently, you’ll enjoy the nice side effect of 100% code coverage, because you’ll only build the code necessary to satisfy your tests.

Notes

[1] Sure, you know that the line executes without causing the application to crash (at least for this set of inputs), but does a lack of crashing really constitute success? Not likely.

[2] While we have 100% line coverage for the #index method, we don’t have 100% coverage for the ProductsController class as a whole. As the full coverage report shows, the tests provided with the Rails scaffolding do not cover the exception cases in the #create and #update methods.

[3] Incidental coverage is just as valuable as that guy that puts in “face time” at the office. He’s physically present at least 8 hours every day. People see him there. He’s at the office. He must be doing something. Right?

[4] At this point we’ve gone from covering nothing to covering something. That’s good, but we should ask ourselves whether we’re performing the most appropriate assertions for this test case. For example, is it enough to assert that assigns(:products) is not nil, or should we be making a stronger assertion about its value? Do we want to ensure that we’re rendering the intended view templates? What about verifying the content of the HTML (or XML) that gets rendered? Tackling issues specific to testing Rails controllers would take away from the focus of this post, but the strength of assertions and the layers at which we test is surely fodder for future posts.

[5] However, with anything less than 100% coverage, fearless refactoring and full regression testing simply isn’t something your test suite can provide. In Software Testing Techniques (1996), Boris Beizer takes the hardline on anyone that would consider developing a codebase without at least 100% line coverage: “[Line coverage] is the weakest measure in the family [of structural coverage criteria]: testing less than this for new software is unconscionable …”

[6] You could argue that this new test case is too similar to the original test case, that it needs stronger assertions, or that the current functionality is too simple for this new test case to offer sufficient benefit, but it’s up to you to assess how important automated testing is of that code. If you opt to not test your application’s ability to provide an XML-formatted list of products, you must not allow your 100% line coverage to give you a false sense of security about your code. That code may be covered, but it’s not tested.


This series is taken from the How To Fail With 100% Test Coverage talk. Check the schedule for a talk near you.

Thanks to Stuart Halloway for reading drafts of this post.

Tags: , , | 4 Comments »