The real question is, "what process model do you use for multiple CPUs?" Assuming a Unixy OS, you can use threads or multiple processes.
If you choose multiple processes, you have to intelligently use IPC to coordinate them. But you can happily use pretty much any language.
If you choose threading, scripting languages are problematic. For example, Python has a Global Interpreter Lock. In that case, I think the only "safe" choices are C++ and Java. Java's safe because its "synchronized methods" enforce locking, often in a wasteful manner. C++ leaves the entire synchronization problem in your lap, giving you infinite room for performance and for subtle and maddening bugs.
Multi-process is generally a better approach than threading. I've seen too much time spent debugging threaded apps to ever recommend that approach. Every hour poured into debugging a deadlock, or worse yet, a mysterious cross-thread heap trampling is an hour taken away from improving the product.
Haskell is excellent for parallel processing -- probably as good as Erlang if you're just talking about multiple cores on one system and not distributed processing.
I just want to second what what several others have implied about Erlang. It makes the actor model feel right to the point that the fact that your program can run on multiple cores/cpus/computers is seems incidental.
Ruby/Rails/Mongrel/Pound does a great job of utilizing all those CPUs that I have in a production environment. There's a bunch of overhead, but it is easier than having to deal with locking.
It really depends on your application, though. If it isn't a web application, and state matters, then you're going to have to go elsewhere.
I was hoping that Python threads split across processors, but it seems that they don't. Is there a language that does it "automagically" without having to explicitly code it?
Have a look at Erlang. Actually, have a look at functional languages in general as being (mostly) side effect free they often have primitives for parallel processing.
Another pretty exciting language you might to look at is F# - presently running under Mono on my macbook.
No, the standard Python interpreter kind of sucks at this. You might want to look at Stackless Python though, it has all sorts of neat concurrency primitives. http://www.stackless.com
Your OS process scheduler should do it to some degree at runtime, but beyond that there is not a lot that can be achieved without explicitly handling synchronization and locking of shared resources. I know there are some startups working on parallel processing optimizations though, so there could be some new developments in the future. Google has been acquiring a few of these startups recently..
I love python and use it for many of my prototyping needs but if you really need that extra juice from your extra processors you are probably using the wrong language for the problem.
From my experience average increase in speed gained from porting to C/C++ (even without using multi processors) is about x60-x100.
If you choose multiple processes, you have to intelligently use IPC to coordinate them. But you can happily use pretty much any language.
If you choose threading, scripting languages are problematic. For example, Python has a Global Interpreter Lock. In that case, I think the only "safe" choices are C++ and Java. Java's safe because its "synchronized methods" enforce locking, often in a wasteful manner. C++ leaves the entire synchronization problem in your lap, giving you infinite room for performance and for subtle and maddening bugs.
Multi-process is generally a better approach than threading. I've seen too much time spent debugging threaded apps to ever recommend that approach. Every hour poured into debugging a deadlock, or worse yet, a mysterious cross-thread heap trampling is an hour taken away from improving the product.