> The audio fakes are too "clean" (too defined breaks between words and no background noise) and sound like caricatures.
The lack of background noise alone was a pretty instant key for me. And adding realistic background noise, appropriate echo/reverb that would match (even roughly as one-on-one conversation vs podium at press briefing) the room they are speaking in aren't even things you need deepfake for and would be simple audio post-processing.
The impersonators were so bad that I assumed they were intentionally added as a control. Anyone who called those real should likely be discarded from the final results.
The lack of background noise alone was a pretty instant key for me. And adding realistic background noise, appropriate echo/reverb that would match (even roughly as one-on-one conversation vs podium at press briefing) the room they are speaking in aren't even things you need deepfake for and would be simple audio post-processing.