My understanding is that /dev/urandom is perfectly fine for almost all cases, including session keys and such but (if only for the sake of paranoia) for long lived keys it is worth the potential extra time waiting for /dev/random to serve what you need.
The key to remember is that when the pool has sufficient entropy there is no difference between /dev/random and /dev/urandom, and if the pool is low then there is practically no difference between /dev/random and /dev/urandom - the quality of the PRNG means it is practically impossible to tell the difference between the two outputs (take a few thousand bits from each at a time and see if any statistical analysis can reliably tell the difference).
It is increasingly common for CPUs and/or related chipsets to have a built in TRNG so keeping the entropy pool "topped up" is getting easier by feeding the pool from those using rng-tools. The SoC RPi's are based around has an RNG that pushes out more then 500kbit/s for instance.
Your understanding is common, is stated explicitly in the manpage, and is unfortunately incorrect.
/dev/random and /dev/urandom both even use the same CSPRNG behind the scenes. The former tries to maintain a count of the estimated entropy, but this is a meaningless distinction. CSPRNGs can't run out of entropy (for instance, a stream cipher is essentially a non-reseeded CSPRNG that works by generating an arbitrarily long sequence of computationally random bits that can be XORed against a plaintext).
There might be a meaningful distinction if /dev/random provided "true" randomness (and could therefore be used for something like an OTP). But it doesn't. Both use the same CSPRNG algorithm.
I understand that both use the same CSPRNG and seed source(s) for entropy, the difference is one will block if those sources have not output enough information recently (the "pool count" is too low).
The is some genuine randomness there as the entropy sources are not (unlike the PRNG) deterministic: they take whitened fractional values from I/O timings (time between keep presses & mouse signals, and some aspects of physical drive I/O - the low bits of such timings essentially being random noise if the timer is granular enough).
/dev/urandom uses the CSPRNG in what-ever state it is in, /dev/random waits until it considered the CSPRNG to have been sufficiently randomly reseeded. In cases where the current situation is considered random enough (the pool count is high so /dev/random will not block) you will get the same value from either /dev/random or /dev/urandom.
Assuming it has been seeded with enough entropy, if you just booted and haven't gathered/seeded with entropy yet, then /dev/urandom can potentially give you predictable values, whereas /dev/random would be safer as you'd wait until it has enough entropy.
Too bad there isn't a way to tell whether the CSPRNG has been seeded or not.
If you are being paranoid you might prefer to wait for ever for a good random value instead of accepting something you are even fractionally less sure of.
Though practically speaking, that would probably not be acceptable in most (if not all) circumstances.
If you are that paranoid then there are inexpensive true-RNGs out there (free in fact, if your CPU or other chipsets have one that is easily accessible) which can provide enough bits for all but the larger bulk requirements (i.e. generating many keys in a short space of time). You can either use one of them specifically for the process(es) that definitely wants absolutely true random of feed its output into the standard entropy pool.
The key to remember is that when the pool has sufficient entropy there is no difference between /dev/random and /dev/urandom, and if the pool is low then there is practically no difference between /dev/random and /dev/urandom - the quality of the PRNG means it is practically impossible to tell the difference between the two outputs (take a few thousand bits from each at a time and see if any statistical analysis can reliably tell the difference).
It is increasingly common for CPUs and/or related chipsets to have a built in TRNG so keeping the entropy pool "topped up" is getting easier by feeding the pool from those using rng-tools. The SoC RPi's are based around has an RNG that pushes out more then 500kbit/s for instance.