Test writing: how many and which, **when using a modern, strongly typed language**?
Posted by dogweather@reddit | ExperiencedDevs | View on Reddit | 42 comments
Lots of conventional wisdom on the subject of "how many tests to write, and what kinds" seems to be old.
The heuristics were formulated in a world with languages that have very basic and somewhat weak static typing (like old Java) or completely dynamic and rather weak (e.g. JS, older Python, Ruby, Perl).
Now, though, truly powerful type systems are making it into the mainstream in langs like Rust, Typescript, Python with type hints, Ruby with Sorbet, and even Haskell has gotten more mainstream.
Personally, I always prefer to get type errors while writing code vs. test failures:
- Type errors seem to always point to the exact bug whereas failing tests only give you a clue at best.
- Type errors also only depend on you writing declarative code. But test writing is a whole other skill to master .
- Tests become an entire codebase unto itself with its own idioms and coding practices such as mocking (or not mocking), stubbing (or not), VCR replay or not, etc.
- There usually aren't tests for your tests, unless your team also does mutation testing.
- Finally, I'm currently working with a large codebase that has 100% test coverage and a large amount of test factories and mocks. But very dynamic, weakly typed source code. And the result: a very high bug rate. Much higher than another codebase I work with that uses strong typing (not necessarily entirely static—mostly inferred).
Any strategies to share?
BanaTibor@reddit
A strongly typed language itself prevents a huge amount of bugs.
I am a big TDD fan, so I say write tests before the production code, write enough to drive the feature development, and no more than needed.
editor_of_the_beast@reddit
My experience is that no single practice has a meaningful effect on bug rate. I’m a fan of static types for other reasons, mostly around making a codebase easier to navigate and read for someone who isn’t already familiar with it.
I sense that you’re implying that types lead to fewer bugs, and that’s surely not true. It’s true in a way, in that it prevents certain bugs. But those aren’t very interesting. Types can only get you so far. There are mathematical reasons for that if you’re interested to hear more / don’t believe that.
Slight_Art_6121@reddit
I would like to hear more
casualfinderbot@reddit
this is a good example of why 100% test coverage is retarded
teerre@reddit
This is just the push left discussion. Of course if you can catch things at compile time, that's better. But for your average software, that's literally impossible.
However, saying high test coverage pay for itself seems suspicious. Tests are the proof that your code works. Hell, it's the proof of your work. Countless times I've submitted a PR that is one line code change and a hundred lines of tests. That's the whole work right there, if I just comitted the one line nobody would know how, why, when that was changed. Tests are documentation. Living one.
jneira@reddit
re tests are the proof that you code works:
"Program testing can be used to show the presence of bugs, but never to show their absence!"
NormalUserThirty@reddit
im not sure about you, but personally i generally have a sense of when something needs tests, vs would benefit from tests vs likely doesnt need tests.
sometimes i write code and think; "yeah thats pretty complicated and probably doesnt work", so then I write tests because its faster than manually trying things out.
the bigger the module, the more contributors a module has, the more open the module (private, vs cross company vs public), the more critical the module, the more i want tests.
thats basically it.
nizzlemeshizzle@reddit
The only answer you are going to get is it depends. Fundamentally there are two reasons to test:
a) to ensure correctness b) to ensure that new development does not break old behaviour/state
A strongly and statically typed (optionally functional) language that forces you to make invalid state unrepresentable and thus behaviour to be guaranteed to be correct at compile time can elliminate almost all tests for reason a). However in case b) the protections on offer are much weaker.
The TDD lot who love wasting their time writing meaningless tests for everything will be up in arms about this, but depending on how unimportant a given subsystem is or where the cost of failure is next to nil, testing is pointless.
roger_ducky@reddit
I write tests to document behavior, not to test behavior. It’s effectively declaring behavior and ensuring it doesn’t change. If you’re trying tests for coverage only, that’d easily explode into 10 to 20 times more test cases than absolutely necessary.
fluffy_in_california@reddit
Which is WONDERFUL when maintaining code.
Refactoring for performance? The tests make sure you probably didn't break stuff by accident!
Adding, changing, or deleting functionality? The tests make sure you didn't accidentally break anything you didn't actually mean to!
Want to try using ChatGPT to refactor some code? It does a good job of keeping you from borking yourself!
Tests aren't just for new code - they are for maintaining code.
doberdevil@reddit
That's a test.
roger_ducky@reddit
That’s what we use tests for, but I consider it a declaration of invariants in mathematics.
doberdevil@reddit
Nice. I just call it a "test" so nobody gets confused about the purpose.
c0deButcher@reddit
Test cases help you to eliminate dead code while implementation. They also help to verify if new code change doesn't break previous code.
hippydipster@reddit
Static typing is like test first development. There's a lot of tests prewritten there for every codebase.
I wonder if, for a given particular planned codebase, more specific tests could also be prewritten to assert some properties of the system you're going to write will always be true?
ventilazer@reddit
Dynamic languages start failing rather quickly as the project grows. They have a higher bug rate per lines of code. For small projects you'll have 15 bugs you don't know about but if dozens of developers work on it for years you're looking and hundreds of bugs you don't know about and as new features keep getting added on the prod starts failing more often where you're constantly fire fighting.
nawap@reddit
If the codebase has 100% coverage and there are bugs then it doesn't have 100% coverage. Line coverage is not the same as branch coverage and in a dynamically typed language each variable access is a potential branch.
Are the bugs mostly type related? I work in a dynamically typed code base as well as a statically typed codebase. The bugs are rarely type related because we program defensively in the layers where data transformation needs to happen (e.g. API or form controllers).
As to your original question: your tests should exercise the business logic, including error handling. If the errors can be produced because of types, that has to be asserted as well. Even in a statically typed codebase you'd want to assert on the transformation layers to ensure that your json parsing logic is correct as per business rules.
miredalto@reddit
100% line coverage, including 100% of branches, still only covers a minute fraction of paths through the code, i.e. combinations of branches. That's an exponentially larger number, and is the essence of the Halting Problem.
So even if your spec is fully complete, i.e. doesn't itself contain bugs or omissions, and your tests are 'perfect' in some local sense, they still can't rule out bugs.
Which actually doesn't quite refute your statement, since if you do find a bug you probably want to add a regression test when you fix it, but I don't think that's what you meant...
doberdevil@reddit
You can have have 100% code coverage, but if the code is missing functionality that would handle bugs, there you are.
Say your product is an airplane. You write the tests, and feel confident because you have 100% code coverage. Everyone yells ship it and pats each other on the back. Plane takes off, everyone goes home. Next morning they wake up to find out that even though they had 100% code coverage, they forgot to write the function that lowers the landing gear.
nawap@reddit
Of course. Tests don't provide any information about stuff that should be there but isn't.
Hillel Wayne has written about the utility of tests multiple times, e.g. in https://www.hillelwayne.com/post/uncle-bob/
Basically no one type of test is enough. If you can only have one, almost any other kind of test (property based, integration, mutation) is better than unit tests.
doberdevil@reddit
I was referring to your statement about 100% coverage with bugs not really meaning 100% coverage.
Code Coverage can be a warm fuzzy metric, but it's often misunderstood and abused. It'll turn around and ruin your day when you aren't looking.
nawap@reddit
I think we are in agreement.
koreth@reddit
That doesn't seem true to me. The tests can have bugs that don't reduce coverage. Or they can have an incomplete set of assertions such that a buggy piece of code is fully exercised by the tests but the erroneous result or unintended side effect doesn't cause a test failure.
When I've worked in complex domains, a recurring problem even on teams with strong testing cultures is that the domain experts don't have a complete understanding of the desired behavior or aren't able to describe it exhaustively. The engineering team rigorously tests that the code performs exactly as specified, but the specifications were missing some constraints on the output, and the software does something that surprises the end user.
And even in non-complex domains, specifications can be wrong in ways that are invisible to engineers who aren't domain experts. The end user doesn't care if the software is misbehaving because the engineers got it wrong and didn't write a test to catch the problem or the product owner got it wrong and the tests followed suit; to them it's a bug either way.
nawap@reddit
Yes, but we have to start with some axioms to make useful statements and I am starting with the axiom that tests themselves aren't buggy.
But this is describing less than 100% coverage. Line/statement coverage can be misleading. Branch coverage is much better but is hard to get perfect. People are generally only dealing with line coverage, unfortunately.
I agree with the rest of your points.
edgmnt_net@reddit
Not sure how to interpret this, but if you mean testing simple branches that pop out with an error, that's quite meaningless IMO. For one thing, it's right there in the code, what does testing give you over reviewing the code? The possibly less obvious danger here is that not only it takes a lot of effort to cover that, it also couples code and tests and provides what I call assurance by mere duplication.
Finally, there are obviously things you can never test to any meaningful degree and that includes a lot of interactions with external systems or complex yet straightforward mappings (as is often the case when people write extraneous layers). Relying on any kind of test to tell you that you're not operating a database in a raceful manner is wishful thinking. Those are real bugs and they'll bite you when the planets align and you lose all your data. Defensive programming and strong reviews are way more likely to yield benefits there, both of which are aided by avoiding extensive layering, mocking and transformations.
There's also the question of whether you're making quality assertions and not just exploring the state space. That becomes really hard for some of the situations I mentioned. It's very easy to end up asserting "the code updates this, then updates that" but the tests won't be robust enough and it's still dependent on your understanding of what constitutes a good interaction with the external component.
worriedjacket@reddit
Ideally, this is done using meta-programming. Most languages have some way of doing that automatically.
ab5717@reddit
I love this topic of discussion and, as reflected in the comments, there are differing opinions as well as general consensus on a few points. I emphatically agree with the emphasis on: it depends on a number of factors including type system, domain complexity, and application type.
One thing I think that I've improved on over the years is carefully considering what are the most valuable assertions I can make. Like many here, I don't find it necessary to obtain 100% coverage, or juke the test coverage stats with meaningless tests.
I try to consider what a given test is actually testing. I'm sure I'm not the only one who has come across a legacy-ish code base with tests that are (in some places) essentially testing the standard library of the language being used. I don't think a table-driven test that makes repeated assertions on the results of
strings.Contains(s, substr string)
is the best use of our time.One thing most people seem to agree on, is that there is no single silver bullet. A diverse set of strategies are needed. Unit, integration, end-to-end, benchmark, property-based/fuzzing, acceptance, and even (time and knowledge allowing) chaos testing all have their time and place.
Some of my favorite points made so far (or how I understood them) in the comments are these: - tests can serve as excellent documentation, especially if the tests focus on the public API of a module, acceptance-like tests for business logic, or are not overly focused on implementation details potentially causing a painful level of coupling between the tests and the code under test - strong and static type systems are awesome (so are dependent types) and have multiple benefits, but can lead to a false sense of security. I literally encountered a Rust codebase with no tests because someone believed (and said to me)
bc of Rust's type system, if the code compiles, then you can be reasonably sure that everything works
🤦♂️🤦♂️🤦♂️ - interacting with external systems is difficult to test in a meaningful way or with high degrees of certainty and one has to use discretion/caution when it comes to stubs, doubles, mocks, spies, etc. - to quote one commenter (@koreth bc I couldn't articulate this any better)When I've worked in complex domains, a recurring problem even on teams with strong testing cultures is that the domain experts don't have a complete understanding of the desired behavior or aren't able to describe it exhaustively. The engineering team rigorously tests that the code performs exactly as specified, but the specifications were missing some constraints on the output, and the software does something that surprises the end user
- I recently worked at a utility company developing SCADA and HMI applications for electrical and power system engineers. Since no one on my team (including myself) have a formal electrical engineering background this was a highly complex and alien domain for all of us. These domain experts knew what they wanted from a 30,000 foot perspective, but breaking it down beyond that, we had a profoundly difficult time nailing down and mutually understanding specifications and acceptance criteria.Something I've become an increasingly enthusiastic fan of is formal specification with model checking (using TLA+, Alloy 6, and friends. BTW shout-out to @eraserhd for the Idris mention). Understanding this is outside the scope of this thread, but considering the industrial use section of Leslie Lamport's website, or the practical formal methods repo should help one see the use-cases for these tools. These tools can more effectively handle the state-space explosion, serve as documentation, and help validate the logical soundness, safety, liveness, and fairness properties of a distributed system or concurrent algorithm.
This is just a loosely related side note but one thing I stumbled across a few years ago that I think is kinda novel and cool is learn go with tests. I really like the idea of practical TDD serving as a vehicle for learning a programming language. It's not just trivial examples either. It covers general testing concepts as well as a lot of actual scenarios a Go developer will encounter. It has continued to grow over the years to capture many practical scenarios and applications. I would have loved to have an analogous resource when first learning Rust.
Comprehensive-Pea812@reddit
ROI for each line of test code is different.
For example,
testing string concatenation vs regex function is different.
The latter gave more ROI due to the code complexity.
Test complex part first. Those that you cant easily test on UI or manually.
For example, email validation will take so much effort to test on every regression while the unit test can test that in less than 1 second.
100% coverage doesn't mean zero bugs as test cases can be lacking. 60% coverage can means they cover all use cases but don't test the model setters getters
behusbwj@reddit
TypeScript was a bad example. It’s compiler allows you to express powerful types, but as soon as you leave the boundary of your local project and have to guess the type of data over networks and protocols, kt becomes extremely flimsy and prone to error (because TypeScript will not validate your contracts and types at runtime, only compile time with the assumptions you baked in)
68696c6c@reddit
I don’t see a good type system as a replacement for tests in any way. Rather, a good type system frees you from needing to write the kind of frivolous, additional tests you often end up writing in dynamic languages.
Basically, the compiler and linters make sure your code runs without errors. Tests are for making sure your business logic functions as expected. With that in mind, I focus my efforts on unit testing my services that encapsulate my business logic. Unit testing until functions is also low hanging fruit with a high impact. Once those are covered I’ll write handler tests to cover stuff like access and permissions and make sure everything works together as expected. E2E integration tests are more of a QA responsibility to me, if I have the luxury of having a QA team at all. They serve a separate purpose and aren’t a replacement for other kinds of tests but are usually the last thing I’ll do just because they take more work to setup and don’t give as precise feedback as unit tests.
doberdevil@reddit
The problem with this is you may not know how many times a co-worker found a bug in their code because running tests locally caught it. A bug caught during development, never merged, and never shipped doesn't get attributed as tests paying for themselves.
edgmnt_net@reddit
I pretty much agree with you. The typical unit tests are usually and particularly a waste of time, IMO. People often neglect to mention that it's entirely possible to manually test stuff to a pretty good degree and that automation can actually decrease code quality by adding layers and indirection. Or that much of what tests are claimed to do can be achieved more reliably through stricter code reviews, decent abstractions and more thorough research.
I'd say the prevalence of testing tends to be justified by lack of language safety in general terms. Lack of null safety or dynamic typing have a cost that ends up being "paid with interest" by aiming for extensive coverage. Said coverage doesn't even have to consist of good quality and meaningful assertions, all the tests need to do is trigger code to run, so it can stumble upon the landmines.
This seems extremely prevalent in ecosystems which use less safe languages, including JS or ~~old and crusty~~ enterprise-grade Java, along with a general disregard for other best practices. Unfortunately these large niches live in their own social bubbles and also tend to consider what they do as best practices, despite plenty of evidence to the contrary and these projects being nowhere near state of the art.
What one needs to do to achieve full coverage often also plays right into tendencies to attempt to make components fully replaceable, which is wishful thinking in many cases. The latter becomes one supporting justification, because then we'll totally be effortlessly able and willing to swap MongoDB for PostgreSQL, for instance.
I have a hunch it may also be that these testing-related practices also work as brakes to slow down an otherwise rabid pace of unchecked development and that some partial benefits, where noticed, may be indirectly but solely attributed to that.
I'd personally go for some higher-level sanity testing first and foremost. I feel that's usually good enough to catch mistakes. I sometimes write unit tests, but I usually do so for things that are particularly testable, which is usually stuff like algorithms or pieces of business logic that are amenable to breaking out and testing in isolation, or if there's some value that can be otherwise justified explicitly.
eraserhd@reddit
I'm a big fan of strong types and have even contributed some code to Idris. Not the big brain stuff, but a bunch of gruntwotk.
I'm also familiar with the feeling that good types make testing unnecessary. You certainly don't need to test some classes of things -- that is the class of things that you can enforce by types.
But the thing you want to test is business logic. Can you encode business logic in the type system? IF you have a system with dependent types, AND an excruciating amount of time, OR you have a very simple domain, then you can.
If not, then you are missing something that you need to test.
Studies have not been able to show that type systems (at least the ones we have today) are generally able to reduce bug density.
I think there's an important reason that it feels like it does, and that is because we use types to model the problem abstractly, then the code (to some degree) must follow. But I think it's important to notice that some degree of this feeling is illusory, because there are bugs in the modeling as well as the implementation.
Tubthumper8@reddit
This is where statically typed languages have a massive difference between those that have sum types and those that don't. Most business logic is "do this or that" or "do one of these 4 things" which is what sum types are for.
You don't have to encode every little thing but main decision points of the business logic absolutely can and should be encoded in the type system.
alien3d@reddit
Even integration test sometimes success but in real life bugs. If you deal with database , on transaction and make proper error handling. But some qa mostly don't write description flow what / when they found the bugs.
fluffy_in_california@reddit
...
...
...
Strong typing just verifies you aren't mis-using variables as 'the wrong type of thing'. This is extremely useful as there are many bugs that come down to "I used a variable of type A as if it were a variable of type B".
It does not verify your code is actually doing the right thing. It has no CONCEPT of what 'doing the right thing' even is.
It only verifies that a particular class of bad thing isn't being done.
Switching tracks to test coverage: 100% test coverage ALSO does not guarantee your code is doing the right thing.
Not even if it includes 100% coverage of all branches and conditions. It only proves that for the cases you tested, it does the right thing.
I say this as a person who makes a habit of shipping code to public repos that has 100% or as close to 100% code coverage as I can manage.
I regard that code as needing a strong assurance to the people using it that I have made every practical effort to make sure it does the right thing.
And bugs still sometimes slip through. Such is life.
xsdgdsx@reddit
OP, it sounds like you're complaining about a bad codebase, not about testing strategy. As others have already commented, there are good tests and bad tests, and either one will get you line coverage.
The answer for how many tests you need: as many as it takes to get the job done. As others already wrote, it depends. Also, considering what's in scope for unit testing versus integration testing.
Simple example: you write a state machine to run a traffic light system with an emergency vehicle override. Pretty basic patterns, but also, it's super easy to make it buggy even with perfect type safety and 100% line coverage. Like, what if firetrucks from different directions both say "turn my light green"?
The number and type of tests to write depends on the intended functionality. That's kind of it. If your test base replicates all of the relevant program states, then you're guaranteed to be done. But the number of relevant program states becomes approximately infinite pretty quickly, so you usually have to make an informed judgment call, and it takes experience to do that well.
Golandia@reddit
Typing isn’t really a testing concern. It just gives you some extra guardrails. Or not if you do lots of reflection.
Coverage is usually a good measure but you can artificially inflate it with junk tests. E.g. a Golang project I worked on had nearly 100% coverage and plenty of bugs. They used a lot of codegen to make tests that pass and increase coverage.
I’d say you want high coverage with high quality tests including unit, integration and end to end tests. Test quality usually improves with data quality. You can do things like use copies of live data. And test scenarios matter. Such as a happy path, negative paths, higher complexity happy paths, etc with strict validation. Also validate mocked dependencies are called correctly.
A contrived example is testing a broken binary search. Let’s say it just searches everything but the right most entry (you can create this bug easily). If you give it s big array, pick the second element to find, well that’s a passing test with 100% coverage regardless of language.
So what can you do? How can you be certain it works? I’d say for some large array, try finding every value, try find nonexistent values above and below, try empty arrays, single sized, etc. Your tests should prove it works as expected.
maria_la_guerta@reddit
My strategy is to be more like the second codebase than the first. I adopt strong types any chance I get. IMO it's almost a requirement of a large scale codebase that will have multiple contributors.
If I were you I would quantify the dev hours that go into your bugs and 100% coverage against the cumulative dev hours of incrementally rolling out types in your codebase and rolling back on testing. Present those numbers and your math here ^ to your leadership and you'll likely get buy in to start working towards something better.
I'm a fan of coverage, fwiw, which is a spicy topic on its own, but 100% coverage encourages bad testing in every codebase. 75% max, but with types I'm happy with even 50%.
i_exaggerated@reddit
I think there might be too many differences in the test methodology between your two projects to really attribute it all to typing. A lot of mocks lead to weak tests (I think), because you're faking most of the behavior. It sounds like it's testing a lot of implementation and not much function. Whereas your second project is actually executing the code under test.
Gammusbert@reddit
Acceptance testing, i.e. based on the possible ranges of inputs & outputs what happens?
worriedjacket@reddit
Type systems create mathematical proofs about your programs behavior about compile time. They have existed before computers even.
They're pretty cool stuff