I have a potential project that would involve testing somewhere in the range of 500 Cassandra nodes. What are the best tools to use in load testing such installations? Is everything pretty much custom?
With out more information it's very difficult to give advice.
If you're not load testing with your real actual application code and real application load I wouldn't even bother testing. The numbers will be so misleading that it's mostly pointless.
What does it matter if your cassandra install can do 500,000 writes per second if your real app exhibits lock contention issues that brings that number down to 5,000 per second, or latency issues that bring the number down to 50,000.
Since you should be performance testing with real application code and load you'll need to add two things to your code:
1) Code to record the load (logs can work great for this)
2) Code to playback the load at a multiple
Then I'd add the parameters you want to tune for to your testing code and use a genetic algorithm to tune the parameters for your cassandra install.
So, yes, real load testing always involves custom code. If you're just looking for numbers to impress management then use whatever because it's not going to correlate to anything.