My guess is that they don't want to distort results. Any kind of scoring would present an incentive to start cheating (e.g. by not checking the "I've seen this before"-box), be it consciously or subconsciously.
From the presented samples I'd wager that one goal of the study might be to quantify the influence of audio-only vs video-only vs text, hence text-only, video-only, audio-only, and video with subtitles.
While text limits the decision making process to preconceived notions about the person in question and prior background knowledge, video and audio may present clear cues from human perception alone.