To add to what aphyr says, you generally need three components for generative testing of distributed systems:
1. Some sort of environment, which can run the system. The simplest environment is to spin up a real cluster of machines, but ideally you want something fancier, to improve performance, control over responses of external APIs, determinism, reproducibility, etc.
2. Some sort of load generator, which makes the system in the environment do interesting thing
3. Some sort of auditor, which observes the behavior of the system under load and decides whether the system behaves according to the specification.
Antithesis mostly tackles problem #1, providing a deterministic simulation environment as a virtual machine. The same problem is talked by jepsen (by using real machines, but injecting faults at the OS level), and by TigerBeetle's own VOPR (which is co-designed with the database, and for that reason can run the whole cluster on just a single thread). There there approaches are complimentary and are good at different things.
For this bug, the critical part was #2, #3 --- writing workload verifier and auditor that actually can trigger the bug. Here, it was aphyr's 1600 lines of TigerBeetle-specfic Clojure code that triggred and detected the bug (and then we patched _our_ equivalent to also trigger it. Really, what's buggy here is not the database, but the VOPR. Database having bugs is par of course, you can't just avoid bugs through the sheer force of will. So you need testing strategy that can trigger most bugs, and any bug that slips through is pointing to the deficiency in the workload generator.)
And honestly--designing a generator for a system like this is hard. Really hard. I struggled for weeks to get something that didn't just fail 99% of requests trivially, and it's an (ahem) giant pile of probabilistic hacks. So I wouldn't be too hard on the various TB test generators here!
1. Some sort of environment, which can run the system. The simplest environment is to spin up a real cluster of machines, but ideally you want something fancier, to improve performance, control over responses of external APIs, determinism, reproducibility, etc. 2. Some sort of load generator, which makes the system in the environment do interesting thing 3. Some sort of auditor, which observes the behavior of the system under load and decides whether the system behaves according to the specification.
Antithesis mostly tackles problem #1, providing a deterministic simulation environment as a virtual machine. The same problem is talked by jepsen (by using real machines, but injecting faults at the OS level), and by TigerBeetle's own VOPR (which is co-designed with the database, and for that reason can run the whole cluster on just a single thread). There there approaches are complimentary and are good at different things.
For this bug, the critical part was #2, #3 --- writing workload verifier and auditor that actually can trigger the bug. Here, it was aphyr's 1600 lines of TigerBeetle-specfic Clojure code that triggred and detected the bug (and then we patched _our_ equivalent to also trigger it. Really, what's buggy here is not the database, but the VOPR. Database having bugs is par of course, you can't just avoid bugs through the sheer force of will. So you need testing strategy that can trigger most bugs, and any bug that slips through is pointing to the deficiency in the workload generator.)