Hacker Newsnew | past | comments | ask | show | jobs | submit | hitchstory's commentslogin

A shit test written before writing the code is still a shit test. Mimetic tests arent be any better written after the code either.

If I had to choose between 1) always writing specification-linked tests that make as few architectural assumptions as possible and 2) TDD, sure, I'd pick 1 every time.

1 and 2 is still better though.


Ive had this experience with team-specific vocab where certain terms organically end up having terms with two or more conflicting meanings and it was horrendous. It led to all sorts of bugs, misunderstandings and even arguments.

Even worse, most people didnt realize there was a problem coz they always knew what they meant.

The only time I managed to work past it was by convincing everyone to never use that term again - burning it to the ground - and agreeing to replace it with two or more new, unambiguous terms.

Id love to burn "unit test" and "integration test" to the ground but nobody outside my team listens to me :)

Id probably replace them with:

* code coupled

* interface coupled

* high level

* low level

* xUnit

* faked infrastructural

* deployed infrastructural

* hermetic / non hermetic

* declarative / non declarative


Ive done this too. The exercise wasnt arrays (Im militant about only setting very realistic tasks). My task required modifying existing production-like code and tests.

My hope was always that the candidates would do TDD where it seemed simple and obvious to do so. It was actually pretty rare but the candidates that defaulted to doing that always ended up being better in my opinion. They were always made offers higher than my company could afford elsewhere (so i guess in others' opinions too).

In this thread https://news.ycombinator.com/item?id=43060636 I pondered why most people dont default to TDD for production code and the answer invariably seemed to be "we didnt think TDD was a thing you could do with integration/e2e tests".


I think having tests for all your diffs at the level of published commits/change lists/etc is totally reasonable for software you really care about. What's counterproductive is practicing TDD at the level of individual editor operations.

If I'm fixing a bug, I start by writing a test that reproduces the bug. If I can't do that, I fix the test harness until I can. Then I implement the change, making mental notes of each intermediate bug I think about along the way - things like "I should be careful to name this distinctly so that it's not confused with this other value in scope that has the same type". After that, I cull down that list until it's reasonable and not totally paranoid, and write tests covering those cases. Same thing for any bugs in in-progress code caught by manual testing, fuzzers, etc.

If you have discipline and use version control, you don't need to write tests before you write the actual code to get the same level of coverage as TDD and you waste a lot less time. I've often figured out late in the game how to make something a compile time failure rather than a runtime one - time to delete all those tests written along the way? Encode them all as negative compilation tests? Fundamentally the goal of testing is to describe what behaviors of the software are intentional rather than incidental, and to detect bugs that might be introduced by future changes to the software - TDD mixes both concerns and doesn't put any emphasis on preventing future bugs specifically.

Maybe other people work on different types of things and TDD is great for them, but I write primarily infrastructure code where correctness is critical and I have the luxury of time, and TDD doesn't produce better results for me. This is a case TDD feels like it should work well for, but in my experience it doesn't improve correctness, maintainability, or speed of delivery - at least compared to the alternative I described. I'm sure there's a universe of teams with sloppy practices out there that TDD would be an improvement for, but it's not helpful for me.


>I've often figured out late in the game how to make something a compile time failure rather than a runtime one

This is actually a good (albeit somewhat niche) reason to not write a test scenario at all, but it's still not a great reason to write a test after instead of before.

>Fundamentally the goal of testing is to describe what behaviors of the software are intentional rather than incidental

Yup. A test scenario which is of no interest to at least some stakeholders probably shouldnt be written at all.

This is again about whether to write a test at all, though, not whether to write it first.

>TDD mixes both concerns

I dont think writing a test after helps unmix those concerns any better.

In fact it's probably a bit easier to link intentional behavior to a test while you have the spec in front of you and before the code is written.

I find people who write test after tend to (not always, but strong tendency) fit the test to the code rather than the requirement. This is really bad.

>Maybe other people work on different types of things and TDD is great for them, but I write primarily infrastructure code where correctness is critical and I have the luxury of time

Assuming Im understanding you correctly (you're building something like terraform?), integration tests which run scenarios matching real features against fake infra would seem to be pretty useful to me.

So...why wont you write tests with that harness before the code? Im still unsure.

The only thing "special" about that type of code that i can see (which isnt even all that special) is that unit tests would often be useless. But so what?


>This is actually a good (albeit somewhat niche) reason to not write a test scenario at all, but it's still not a great reason to write a test after instead of before.

But the before-test is strictly negative - it's a waste of time (deleted code, never submitted) and it possibly slowed down development (had to update the test as I messed with APIs).

>Yup. A test scenario which is of no interest to at least some stakeholders probably shouldnt be written at all.

And yet I see TDD practitioners as the primary source of such tests - if you are dogmatically writing a test for every intermediate change, you will end up with lots of extra tests that assert things in order to satisfy the TDD dogma rather than the specific needs of the problem. Obviously this can be avoided with judgement - but if you have sound independent judgement you don't need to adhere to specific philosophies about the order you make changes in.

>In fact it's probably a bit easier to link intentional behavior to a test while you have the spec in front of you and before the code is written.

When implementing to a spec you are absolutely right, but a very small amount of software is completely or even mostly specified in advance.

>I find people who write test after tend to (not always, but strong tendency) fit the test to the code rather than the requirement. This is really bad.

I agree this can lead to brittle tests and lack of spec adherence, but if you are iterating on intermediate state and writing tests as you go, the structure of the code you wrote 30 seconds ago is very much influencing the test you're writing now.

Another issue is that fault injection tests basically require coupling to the implementation - "make the Nth allocation fail" etc. The way I prefer to write these is to write the implementation first, then write the fuzz test - add a few bugs in the implementation, and fix/enhance the fuzz test until it catches them. Fuzz testing is one of the best bang-for-buck testing methodologies there is, and in my experience it's very hard to write a really good fuzz test unless you already have most of your implementation, so you can ensure your fuzz tester is actually exercising the stuff you want it to.

>Assuming Im understanding you correctly (you're building something like terraform?),

I write library code for mobile phones, mostly in Java/Kotlin. I recently did some open source work (warning: I am not actually very proficient with C, any good results are from enormous time spent and my code reviewers, constructive criticism very much welcome). Here's a few somewhat small, contained changes of mine, so we can talk about something concrete:

https://github.com/protocolbuffers/protobuf/pull/19893/files

This change alters a lock-free data structure to add a monotonicity invariant, when the space allocated is queried on an already-fused arena while racing with another fuse. I didn't add tests for this - I spent a fair bit of time thinking about how to do it, and decided that the type of test I would have to write to reliably reproduce this was not going to be net better at preventing a future bug, given its cost, than a comment in the implementation code and markdown documentation of the data structure. I don't know how I would really have made this change with a TDD methodology.

https://github.com/protocolbuffers/protobuf/pull/19933/files

This change moves a memory layout - again, I don't know how I would have written a test for this, besides something wild like querying smaps (not portable) to see if the final page of the arena allocation had faulted in.

https://github.com/protocolbuffers/protobuf/pull/19885/files

This change was written more in the way you recommend - but the whole change is basically a test. I debugged this by reading the code and thinking about it, then wrote up a pretty complicated fuzz test to help find any future races. I'm guessing that you would not consider adding debug asserts to be a violation of "write the test first"? So in this case, I followed TDD's order - not because I was following TDD, but because the code change was trivial and all the hard work was thinking about the data structures and memory model.

https://github.com/protocolbuffers/protobuf/pull/19688/files

All the tests were submitted before the implementation change here, but not because of TDD - in this case, I was trying to optimize performance, and wrote the whole implementation before any new tests - because changing the implementation required changing the API to no longer expose contiguous memory. But I did not want to churn all the users of the public API unless I knew my implementation was actually going to deliver a performance improvement - so I didn't write any tests for the API until I had the implementation pretty well in hand. Good thing too, because I actually had to alter the new API's behavior a few times to enable the performance I wanted, and if I had written all the tests as I went along, I'd have to go and rewrite them over and over. So in this case I wrote the implementation, got it how I wanted it, wrote and submitted the new API (implemented at first on the old implementation) and added tests, updated all callers to the new API, and then submitted the new implementation.

I don't think TDD would have led to better results in these cases, but you sound like a TDD believer and I'm always interested to hear anything that would make my engineering better.


>But the before-test is strictly negative

I actually did this the other day on a piece of code. I was feeling a bit lazy. I didn't write the test and I figured that making the type checker catch it was enough. I still didn't write a test after either though.

Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

>And yet I see TDD practitioners as the primary source of such tests

I find the precise opposite to be true. TDD are more likely to tie requirements to tests because they write them directly after getting requirements. Test-after practitioners more likely tie implementation to the test.

It's always possible to write a shit implementation-tied test with TDD, but the person who writes a shit test with TDD will write a shit implementation tied test after too. What did TDD have to do with that? Nothing.

>if you are dogmatically writing a test for every intermediate change, you will end up with lots of extra tests that assert things in order to satisfy the TDD dogma rather than the specific needs of the problem.

I find that this only really happens when you practice TDD with very loose typing. If you practice strict typing, the tests will invariably be narrowed down to ones which address the specific needs of the problem.

Again - without TDD and writing the test after, loose typing is still a shit show. So, I see this as another issue which is about something separate to TDD.

>Obviously this can be avoided with judgement - but if you have sound independent judgement you don't need to adhere to specific philosophies

I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before". I've never thought the former, but the examples you've listed here look to me only like examples of where TDD didn't save somebody from making a mistake that was about a separate issue (types, poor quality test). None of them are examples of "actually writing the test after would have been better".

>When implementing to a spec you are absolutely right, but a very small amount of software is completely or even mostly specified in advance.

Why on earth would you do that? If I write even a single line of production code I have specified what that line of code is going to do. I have watched juniors flail around and do this when getting vague specs but seniors generally try to nail down a user story tight with a combination of code investigation, spiking and dialog with stakeholders before writing code that they would otherwise have to toss in the trash can if it wasn't fit for purpose.

To me this isn't related to TDD either. Whether or not I practice TDD, I don't fuck around writing or changing production code if I don't know precisely what result it is I want to achieve. Ever.

Future requirements will probably remain vague but never the ones I'm implementing right now.

>I agree this can lead to brittle tests and lack of spec adherence, but if you are iterating on intermediate state and writing tests as you go, the structure of the code you wrote 30 seconds ago is very much influencing the test you're writing now.

Only if the spec is changing too. This sometimes happens if I discover some issue by looking at the code, but in general my test remains relatively static while the code underneath it iterates.

This obviously wouldn't happen if you wrote implementation-tied tests rather than specification-tied tests but... maybe just don't do that?

>Another issue is that fault injection tests basically require coupling to the implementation

All tests requiring coupling to implementation in some way. The goal is to loosely couple as possible while maximizing speed, ease of use, etc. I'm not really sure why fault injection should be treated as special. If you need to refactor the test harness to allow it, that's probably a really good idea.

>The way I prefer to write these is to write the implementation first, then write the fuzz test - add a few bugs in the implementation, and fix/enhance the fuzz test until it catches them. Fuzz testing is one of the best bang-for-buck testing

Fuzz testing is great and having preferences is fine, but once again fuzz testing says little about the efficacy of TDD (fuzz tests can be written both before and after) and preference for test-after I find tends to mean little more than "I like my old habits".

>Fuzz testing is one of the best bang-for-buck testing methodologies there is, and in my experience it's very hard to write a really good fuzz test unless you already have most of your implementation

In my experience you can (I've done TDD with property tests and, well, I see fuzz testing as simply a subset of that). I also don't see any particular reason why you can't.

These methodologies I find do provide bang for the buck if you're writing a very specific kind of code. I will know if I'm writing that type of code in advance.

>This change alters a lock-free data structure to add a monotonicity invariant,

If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

>This change moves a memory layout - again, I don't know how I would have written a test for this, besides something wild like querying smaps (not portable) to see if the final page of the arena allocation had faulted in.

I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug? If so, quite possibly a test would help. I've rarely been terribly sympathetic to the view that "writing a test to replicate this bug is too hard" is a good reason for not doing it. Programming is hard. I find that A) bugs often tend to cluster in scenarios that the testing infrastructure is ill equipped to reproduce the scenario B) once you upgrade the testing infrastructure to handle those scenario types those bugs often poof...stop recurring.

>This change was written more in the way you recommend - but the whole change is basically a test. I debugged this by reading the code and thinking about it, then wrote up a pretty complicated fuzz test to help find any future races. I'm guessing that you would not consider adding debug asserts to be a violation of "write the test first"

I would file that under "tightening up typing" again and also file it under "the decision to write a test at all is distinct from the decision to write a test first".

>I don't think TDD would have led to better results in these cases

Again, I see no examples where writing a test after would have been better. There are just a few where you could argue that not writing a test at all is the correct course of action.


>Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

Maybe this is a static typing thing - if the test won't build because you've made the bug inexpressible in the type system, what test can you even write?

>If I write even a single line of production code I have specified what that line of code is going to do.

>combination of code investigation, spiking and dialog with stakeholders

Does "spiking" in this context mean writing code without TDD?

>I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before".

That's a weaker formulation of TDD than I've seen espoused and practiced, which is usually something more like "before you make any behavioral changes to the code under test, you must write a test that fails; then make your edit so the test passes, and repeat". The problem with your approach at least for me is that until I've messed around a bit in the code seeing what's the best approach is to solving a problem, I don't know what the best structure for a final test is, or whether I can make the change in a way that leverages existing tests to detect the bug.

> I'm not really sure why fault injection should be treated as special.

Suppose you write a test reproducing a bug that reacts to allocation failing. One common way to do that is to inject an allocator that fails on the specific allocation call you had a bug with. But how do you target that specific call? One way is to make the Nth allocation in a test fail - but even slight changes to the production code will make this test start testing something completely different. The solution is to have a fuzz test that injects failed allocations randomly per run - now you can be reasonably confident that even if the prod code changes over time, your fuzz test will still all the allocation sites, preventing regression. I don't see why it's beneficial to write this first or last. Under your model the instrumented allocator setup is a replacement for the single test case that reproduces the original bug.

>If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

No, the original "bug" is that if you have two calls to SpaceAllocated and the second one races with a Fuse call on a other thread, you can see a smaller value result than before. This behavior wasn't guaranteed by the public API (so not a bug) but it's desirable to add. The fix is replacing a singly linked list with a doubly linked list, using tagged pointers to avoid adding storage cost for the second set of links. A test could be written for this; I could inspect the allocated addresses of three arenas and fuse them in a specific order, but I could not actually reproduce the ordering required without a test harness that allows full control of thread execution interleaving - which would be a huge amount of work and in my opinion not actually prevent future bugs, since that interleaving only meaningfully exists in the previous implementation. The full set of possible interleavings is way too large to productively explore. If I had started by writing the test, I would have spent a bunch of time messing around before giving up; because I did the implementation first, I had a much more informed idea of what would be required to test it, and changed my plan for preventing future bugs to documentation.

>I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug?

In this case the "bug" is not really a bug - if we are allocating an arena, we put the overhead at the start rather than end of the first memory block. The reason we're doing this is that while it's totally valid to use the provided memory however we want, we can avoid faulting an extra page if we store the overhead near where we're about to allocate from. So the test would have to test "on a virtual memory platform, do we write to memory in a way that maximizes spatial locality". This is possible to do but it's a total mess of a test, and I have no idea how to do it on Windows. More importantly, anyone who changes this code is going to be doing it on purpose, and we don't guarantee to callers where we'll be placing any pointers we return.

>Again, I see no examples where writing a test after would have been better.

If TDD produces better results for you, that's great. I think I made a case with real examples where using TDD would have been at best neutral, but in practice would have made me spend more time to get the same results.


I still find the skepticism around TDD weird. Except for a few pretty niche scenarios (e.g. it's experimental code or manual testing is cheaper for some obscure reason) i dont really see the point of not doing it.

I especially dont see what is gained by writing the test after.


"I still find the skepticism around TDD weird."

A small community of programmers, with a disproportionately large audience, foretold that practicing test-driven development would produce great benefits; over twenty five years the audience has found that not to be the case.

Compare with "continuous integration" - here, the immediate returns of trying the proposed discipline were so good that pretty much everybody who tried the experiment got positive returns, and leaned into it, and now CI (and later CD) are _everywhere_.

As for what is gained, try this spelling: test driven development adds load to your interfaces at a time when you know the least about the problem you are trying to solve, which is to say the period where having your interfaces be flexible is valuable.

And thus, the technique gets criticism from both ends -- that design work that should have been done up front is deferred (making the design more difficult to change, therefore introducing costs/delays), and that the investment is being made in testing before you have a clear understanding for which tests are going to be sensitive to the actual errors that you introduce creating the code (thereby both increasing the amount of "waste" in the test suite, in addition to increasing the risk of needing test rewrites).

The situation is further not improved by (a) the fact that most TDD demonstrations are problems that are small, stable problems that you can solve in about an hour with any technique at all and (b) the designs produced in support of the TDD practice aren't clearly an improvement on "just doing it", and in some notable cases have been much much worse.

So if it is working for you: GREAT, keep it up; no reason for you not to reap the benefits if your local conditions are such that TDD gives you the best positive return on your investment.


>As for what is gained, try this spelling: test driven development adds load to your interfaces at a time when you know the least about the problem you are trying to solve

If Im writing a single line of production code I should know as much as possible what requirements problem Im actually trying to solve with it first, no?

This is actually dovetails into a benefit to writing the test first. If you flesh out a user story scenario in the form of an executable test it can provoke new questions ("hm, actually I'd need the user ID on this new endpoint to satisfy this requirement...") and you can more quickly return to stakeholders ("can you send me a user ID in this API call?") and "fix" your "requirements bugs" before making more expensive lower level changes to the code.

This outside-in "flipping between one layer and the layer directly beneath it" is very effective at properly refining requirements, tests and architecture.

>And thus, the technique gets criticism from both ends -- that design work that should have been done up front is deferred

I dont think "design work" should be done up front if you can help it. I've always felt that the very best architecture emerges as a result of aggressive refactoring done within the confines of a complete set of tests that made as few architectural assumptions as possible. Why? Coz we're all bad at predicting the future and it's better if we dont try.

This is a mostly separate issue from TDD though.


Coding is not religion for me, I have no patience for fundamentalism.

I will write as many tests as I need to feel confident, which depends on context.

And integration tests give me a lot more confidence than mocked unit tests.


I hate coding fundamentalism with a passion too. The only thing I get really religious about in coding is the importance of trade offs.

The cost/benefit of writing a test before just consistently exceeded doing it after for me.

Same for integration, e2e or unit tests (there's never been a rule that says you can only TDD with a unit test).

The cost/benefit trade off for tests with mocks vs. database is a different topic - orthogonal to the practise of red/green/refactor, and one where IMO the trade offs are much less obvious.


I honestly can't see how writing integration tests before code would even look in practice, that puzzle usually isn't even close to finished at that point in time.

It sometimes makes sense for unit tests; I'll occasionally do that when I'm unsure about the API of the code I'm writing since it allows me to spend some time in the user's shoes.

But like I said, I don't do fundamentalism.


> I especially dont see what is gained by writing the test after.

I assume you mean versus writing it first, rather than versus not writing it at all.

I've found that TDD works well for bottom-up coding, but not so well for top-down.

With bottom up, I can write the test for a piece at the bottom, write the code to pass the test, and move on. With top-down, if I write the test first, it might be a long while before I have that top-level working, because the bottom bits don't exist yet.

When I feel it's better to write things top-down, I'll often use TDD for the bottom bits I need to write, but for the bits above that, I'll write the tests "on my way back up".


"I especially dont see what is gained by writing the test after."

The greatest value in tests is that they help prevent future changes from breaking existing functionality. Writing the test after you write the implementation is equally useful for achieving that as writing the test before you write the implementation.


Not the only value though. Red-green-refactor can also provides live feedback about whether your code is behaving correctly as you write it.

Requiring the test before writing the code also ensures you dont forget to write a test to match the scenario.

So what is gained by test after... is that it is almost as good?

I still dont get it.


I need something to work with before I can write the test. So my order tends to be: get the code working first with the simplest case, and by using it I know that simple case is working, then use that to write the first couple of tests. Only then would I expand the tests to the cases not written yet and to a TDD style.

This order also helps verify I didn't typo something in the test itself and end up TDD-ing myself into broken code.


I usually start with a basic e2e that tests the most minimal happy path possible. It makes no assumptions about architecture or anything else.

You don't need something to work with to write it. You can, by definition, write an e2e test against an app that doesnt exist.

This test isnt special as far as TDD is concerned - red-green-refactor works the same way.

Im sensing a pattern in the answers to my question though. I keep getting "well, if you assume TDD is only done with low level unit tests..."


> Im sensing a pattern in the answers to my question though. I keep getting "well, if you assume TDD is only done with low level unit tests..."

Completely wrong.

Even with your example, there's an initial exploratory stage where you're still figuring out the interface that the tests would use. I, personally, am not capable of using something that doesn't exist. I have to make that initial version first before I can use it in a test.

Quick edit aside: This is also why I rarely work top-down or bottom-up, I work mostly throughline - following the data flow and jumping up and down the abstraction stack as needed.


Im not sure quite why you feel you always need to write code before sussing out what an API or UI should look like but it seems like a very expensive habit to me.

What happens when you then show it to stakeholders (e.g. other teams consuming your API, customers or UX people) or and they tell you to change it again?

Rewrite everything again?

Thats gonna be reaaaaaaaalllly labor intensive and could damage your code base too.

Im equally perplexed about why people dont try to build top down. It's one of those few things in programming that always makes sense regardless of circumstance.


> What happens when you then show it to stakeholders (e.g. other teams consuming your API, customers or UX people) or and they tell you to change it again?

> Rewrite everything again?

> Thats gonna be reaaaaaaaalllly labor intensive and could damage your code base too.

Why would I do that? Only the thing they have issue with would need to be changed, it wouldn't take any longer than another way of doing it.

You seem to have forgotten what I said, something needs to exist for me to work with. Well, in this "stakeholders want something changed", something exists. It's not a rewrite from scratch.


>Why would I do that?

If you change the spec (e.g. changing the contract on a REST API), you will probably need to consult to make sure it aligns with everybody's expectations. Does the team calling it even have the customer ID you've just decided to require on, say, this new endpoint?

>You seem to have forgotten what I said, something needs to exist for me to work with.

No. I'm assuming here that a code base exists and that you are mostly (if not 100%) familiar with it.


So what's the problem? You seem to have gone back on the part I quoted in my previous comment.


If Izkata is anything like me they write code as part of their exploratory design process with no intention of showing it to anyone else until they've iterated their way to a design that they like.


Sometimes you need the unit before you can unit test.

So you end up writing the unit's boilerplate, that doesn't yet do anything but needs to compile/run for a suite to test it throwing a adhoc error of the notImplemented sort or whatever, then break flow to set the test suite and check the reds (that aren't telling you anything useful because of course it's red), then actually write the code.

I find it flow-breaking and cumbersome for little to no gain.

For additions to existing code I find TDD more useful, but even then it's not unusual to decide to move logic around until deciding a final structure, so the units you end up with might be solidified later in time. Writing tests for scraped units is a waste of time then.


>Sometimes you need the unit before you can unit test.

Right. In those situations I TDD with an e2e or integration test.

I dont get why youd restrict yourself to doing TDD with just with low level unit tests.


I don't, but you agree that in that case the unit test comes after? That was the point I was arguing.


Not necessarily. On plenty of projects I have done 100% TDD and never written a single low level unit test.

The type of test is, in my mind, a completely different topic to red-green-refactor and for the decision about which one to write I follow a set of rules which is also unconnected.

TDD is just red-green-refactor. It works with any test.


If you value red green refactoring then you should write the tests first.

I only use that technique for pieces of code that really fit that well - usually functions that have a very strong relationship between their input and output - so I'll write tests first for those, but not for most of my other stuff.


Well ok...but then what kind of code doesnt it fit well?

Almost every user story I follow in production code follows the form of given/when/then scenario which can always be transformed into a test of some kind (e2e, integration, sometimes even unit).

Where it's something like "do x, y and z and then a graph appears" I find TDD with a snapshot test with, say, playwright works best.


I'm talking about strict test-first development here, where you write the tests before you write the implementation.

If you're using snapshot tests (a technique I really like) surely you can't write the tests before the implementation, because you need the implementation in order to generate the snapshot?

(This is what I hate about the term TDD: sometimes it means test-first, sometimes it doesn't - which leads to frustrating conversations where people are talking past each other.)


You need the final implementation before taking the final snapshot but you can write the entire test up front (given/when). The snapshot artefact is generated not written (often in a different file entirely), so Id argue it still fits the definition cleanly.

I agree that "unit test"/"integration test" as a definition sucks horribly and leads to people talking past each other, but I think with TDD the main issue is that lots of people have developed a fixed and narrow idea of the kind of test you are "supposed" to write with it which makes the process miserable if the type of code doesnt fit that type of test.

The whole idea of a unit test being "the" kind of "default" test and being "tests a class/method as a unit" definitely needs to die.


> Red-green-refactor can also provides live feedback about whether your code is behaving correctly as you write it.

No, it provides live feedback about whether your code is passing your tests

If you have written your tests poorly then set out to make the tests pass, then your tests become the target rather than the correct behavior

If you are continuously updating your tests while your code evolves because you missed test cases or your understanding of the behavior has improved, then writing the tests first didn't actually give you any value. In fact it just wasted a lot of your time

Write the code Manually test to verify correctness and to identify the test cases you have to write THEN write tests to protect against regressions


>my understanding of the problem only really forms through writing code and seeing what approaches work.

Unless you are working with a new/untested technology or approach (i.e. you need a spike), the same kind of understanding should form while writing the test scenario.

>Maybe this should be a separate prototype phase

I always either spike (in which case I never TDD) or write production code (in which case I always do). I can't under what circumstances anybody would want to convert spike code you based out as quickly as possible to prove a point into production code.


Depends upon the ORM. Like all frameworks, a really good one is a significant productivity boost while a bad one is faworse than none at all.


And not just the ORM, but the way it's used. If you ensure that lazy-loading is turned off from day 1 and stays off, you might be okay. But if you don't pay attention to this and write a bunch of code for N years until all the "select N+1"s you've been unwittingly doing finally force your DB to a crawl... now you're in trouble.


If you're working with a big ball of mud, I find that the best approach is to immediately start doing TDD with hermetic end to end tests.

Hermetic = could run just fine on their own if run on a freshly installed OS that is cut off from the internet.

The first tests you build this way will be extraordinarily expensive (faking databases & http calls is fiddly), but they pay enormous dividends.

Once you have a large enough body of these and youve refactored some clean interfaces underneath, you can start writing future tests against those.


I'm firmly in the camp that you shouldn't mock the DB and have a local instance for testing but for external resources that you can't reproduce locally I think that's fine.

Otherwise I agree with you.


By faking the DB I meant either running a local, prefilled fake DB server for every test or faking the interface to the DB.

Which one you should do depends on how complex your interactions with the DB are.

Some apps (e.g. CRUD) have half of their business logic encoded in DB queries in which case faking the calls is a bad idea.

Others only do, like, 2 simple queries. In this case there's no point running an actual database outside of a couple of E2E tests.


Yup then I'm with you


The other extreme of this is:

* Bad abstractions which just stick around forever. There are some examples of this in UNIX which would never be invented in the way they are today but nonetheless aren't going anywhere (e.g. signal handling). This isn't good.

* Invent all of your own wheels. This isn't good either.

There's a balance that needs to be struck between all of these 3 extremes.


I know it's just an example, but if you're on linux there's signalfd() which makes signals into IO so you can handle it in an epoll()-loop or whatever way you like doing IO

We can't remove the old way of course, as that would break things, but that doesn't stop improvements


Sometimes removing the old way is the improvement though. E.g. adding an alternative to symlinks doesn't help if symlinks are still allowed.


Why?

That way you break a lot of things.


You have to start somewhere. Strongly specialized programs that f.ex. never access disk and only access network are a good candidate to be tested in either restricted containers or brand new OS-es that carry legacy baggage.

It's doable, but nobody wants to put in the money, time and energy into pioneering it.


>However, very few developers follow this approach religiously

I do it pretty religiously. There are 3 exceptions:

1) I'm doing a spike (i.e. what author calls exploratory code) in which case, probably this code is getting disposed of. This is the one main exception.

2) I'm just tweaking a config value/printed message/something else surface level.

3) The cost of building test infrastructure is prohibitive (if it's a long running project I will aim to keep building that infra until it is possible though...).

That's it. As far as I can tell there arent other scenarios where it isnt a good idea.


I think the premise is correct and I think you are disagreeing with it.

Yes, the pyramid was set out as a goal in its original incarnation. That was deeply wrong. The shape ought to be emergent and determined by the nature of the app being tested (i went into detail on what should determine that here https://news.ycombinator.com/item?id=42709404)

Some of the most useful tests Ive worked with HAVE had a large GUI tip. The GUI behavior was the most stable surface whose behavior was clearly defined which everybody agreed upon. all the code got tested. GUI tests provided the greatest freedom to refactor, covered the most bugs and provided the most value by far on that project.

GUI tests are not inherently fragile or inherently too slow either. This is just a tendency that is highly context specific, and as the "pyramid" demonstrates - if you build a rule out of a tendency that is context specific it's going to be a shit rule.


> Some of the most useful tests Ive worked with HAVE had a large GUI tip. The GUI behavior was the most stable surface whose behavior was clearly defined which everybody agreed upon.

This might be true, but that might also have said something about the layers below that and actually be a symptom for a larger issue within the development organisation.

> GUI tests are not inherently fragile or inherently too slow either.

Compared to testing APIs or Unit tests they are though. Not only do you need to navigate an interface with a machine that is actually intended for humans, you need to also deal with the additional overhead.


>something about the layers below

Absolutely. They were tests over a big ball of mud in a company I had joined recently.

This is, I think, the only good way to work with what is probably (unfortunately) the most common type of real world code architecture.

If your testing approach cant deal with big fragile balls of mud then it is bad. This is why I dont have a lot of respect for the crowd that thinks you must DI first "in order to be able to test". Such architectures are fragile and will break under attempts to introduce dependency inversion.

>Compared to testing APIs or Unit tests they are though.

In the above example there probably wasnt a single code interface or API under the hood that was any good. Coupling to any of those interfaces was fragile with a capital F if you actually expected to refactor any of them (which I did).

Even for decent quality code, the freedom to refactor interfaces is wildly underrated and it is curtailed by coupling a test to it.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: