Engineering well means gaining confidence in quality by any means available

Writing code is cheap, but getting it right is expensive. Every means of ensuring code correctness has significant drawbacks:

  • Automated unit tests are expensive to write and don't address many problems. They also are a pin prick in a domain — a few invocations of a function or class accepting myriad possible inputs.
  • Integration tests are usually even more expensive than unit tests, and they dead end quickly. We cannot exactly recreate production environments, so we're tasked with approximating them in development, but the production systems are always a little different. And, support is rarely baked directly into SDKs, languages, or third party services, which means that we typically have to roll our own systems for recreating production-like environments in test.
  • Static analysis (like type systems) doesn't deal with integrations well, and still has significant limitations. Strongly typed languages must deal with invalid external data via runtime errors, but this kicks us back into the realm of integration testing. Type systems can take us far within big monoliths, but most applications need to talk to the outside world at some point.
  • Human review is very general but also very fallible.
  • Manual testing often makes integration testing easier, but increases time spent on test setup and repitition.
  • Documentation requires that people actually read it, and it also gets stale since there's nothing linking it directly to the code. Also, when applied to novel problems, documentation is just human review again.
  • VCS, feature flags, API and release versioning, rollbacks, and other similar approaches help mitigate exposure to unstable code, but still require that code becomes stable by other means.
  • User feedback and real world usage comes at the cost of pissing off your users when stuff is broken and, potentially, other real harm.
  • Each of these approaches relies on a fallible human to implement it. We can never really get around our own ability to make mistakes.

Many argue that one of these approaches (in particular, some form of testing or type system) is the superior way. And, indeed, you can stretch type systems and automated testing pretty far if you're willing to pour a bunch of hours into those approaches, but it's really unnecessary, and things tend to move much more quickly if you do your own thinking and settle on an approach that suits the problem. I tend to favor these rough guidelines:

  • Database interactions are very frequent in web applications, fake databases are easy to spin up, and interactions with the database can be rather involved, running SQL statements which contain their own internal logic, and containing performance relevant runtime behaviors. For these reasons, I tend to actually create a real database for the test environment and automate tests for interactions with the database when I'm not confident that the queries will work from the evidence provided by the type system and a single manual test.

    However, I can often get what I need from the type system along with manually testing the UI. E.g., for a TODO list app, Drizzle and a manual test in the UI gives me plenty of confidence that everything is working, and I wouldn't save time by automating tests because that confidence is warranted (the manual tests don't usually fail).

    On the other hand, sometimes database behaviors are actually fiddly, and in those cases it's nice to have the automated tests so that we can isolate, iterate and rerun, but the point is that we don't need the auto tests unless we need them, and they don't come for free.

  • I don't write automated tests for most internal APIs (where we own the caller) because most API calls can be sufficiently exercised via a single manual test of the endpoint through the UI and are easy enough to write or fix that we don't end up getting value from rerunning tests. However, sometimes these behaviors are more nuanced or more difficult to exercise, and in those cases, I'll automate tests. These tests also tend to be fairly easy to write with the correct test setup, so if I'm on the fence, then I'll go ahead and write the test.

  • I test functions making calls to third-party services manually on the command line (against the third-party's sandbox). These functions are kept deliberately simple, doing precisely one thing (call the service), so that they almost never need to be rewritten. In this way, we can omit automated test coverage (which is usually not feasible anyway because of sandbox rate-limits and an inability to clean up the third-party environment), and rely on the type system to make sure that we're passing valid parameters to our manually tested function and interpreting the result correctly. Then, if other stuff that relies on the service wants automated testing, we mock out the functions that call out to the service. This works well except when the third-party service has sophisticated behaviors defined by the payload we send it (think an API for a database like Fauna); those cases I tend to avoid outright because the next best thing is trying to integration test against multiple running sandbox services, which tends to get very expensive, or we just kick it out to production and try to use feature flags to safely test manually which is both scary and time consuming. Often, complexity can be partially offloaded to our database layer instead (where things are easier to test), simplifying calls to third parties.

  • Things that work asynchronously (cron jobs) receive times as a parameter so that they can be unit tested with time as an input (rather than playing with mocking out time in the test)

  • For the rest, I rely mostly on the type system and review except in cases where there are very broad types or the type system doesn't yield enough confidence; consider something like a parser which just works on strings — obviously we'd want to do some actual unit testing against it because compiling without errors is not particularly compelling.

  • Everything gets a thorough review with a chunk of time between when I wrote the code and when it goes out (letting your mind clear helps you catch more mistakes). Whenever possible, I get other eyes on the code as well. Doing review well is definitely its own skill, but that's another post.

  • Everything depends on context. Sometimes the stakes are high enough that it makes sense to use crisscrossing strategies even when they're kind of inefficient.

In general the theme is: Do the simplest possible thing that gives you confidence that the thing works. Do not spend extra time on activities that do not yield additional confidence.

Stay up to date

Get notified when I publish something new, and unsubscribe at any time.