Compute (and lots of it) is absolutely needed for generation - 10s of billions of FLOPs per token on the smaller models (7B) alone - with computations of the larger models scaling proportionally.
Each token requires a forward pass through all transformer layers, involving large matrix multiplications at every step, followed by a final projection to the vocabulary.
Obviously I don't mean literally zero compute. The amount of compute needed scales with the number of parameters, but I have yet to use a model that has so many parameters that token generation becomes compute bound. (Up to 104B for dense models.) During token generation most of the time is spent idle waiting for weights to transfer from memory. The processor is bored out of its mind waiting for more data. Memory bandwidth is the bottleneck.
Each token requires a forward pass through all transformer layers, involving large matrix multiplications at every step, followed by a final projection to the vocabulary.