T Technology

Can You Tell the Difference Between AI-Generated Audio and Real Speech?

August 2, 2023

2 min read

AI-generated speech, also known as deepfake voice, is becoming increasingly sophisticated and difficult to detect. Even when people are aware that they may be listening to AI-generated speech, it is still challenging for them to reliably identify deepfakes. This poses a potential risk as billions of people who speak the most commonly used languages could be exposed to deepfake scams and misinformation.

Researchers at University College London conducted a study involving over 500 participants to assess their ability to distinguish between real and AI-generated speech. The participants listened to multiple audio clips, some of which contained the authentic voice of a female speaker reading generic sentences in English or Mandarin, while others were deepfakes generated by AI models trained on female voices.

The participants were divided into two groups. The first group listened to 20 voice samples in their native language and had to decide whether each clip was real or fake. They were able to correctly distinguish between deepfakes and authentic voices about 70% of the time for both English and Mandarin samples. This suggests that in real-life scenarios, where people are not aware they may be listening to AI-generated speech, the detection accuracy would likely be lower.

The second group was presented with 20 pairs of audio clips, with each pair featuring the same sentence spoken by a human and the deepfake. Their task was to identify the fake clip. This setup improved detection accuracy to over 85%. However, the researchers acknowledged that this scenario gave the listeners an unrealistic advantage as they were aware of the comparison.

The study did not address the ability to identify whether the deepfakes sound like the target persons being mimicked. In real-life situations, it is crucial to be able to detect if a deepfake is imitating a specific individual’s voice. Scammers, for example, have cloned the voices of business leaders to deceive employees into transferring money, and misinformation campaigns have spread deepfakes of well-known politicians on social media.

The research does provide valuable insights into the progress of AI-generated deepfakes in mimicking human voices convincingly. However, it also underscores the need for AI-powered deepfake detection systems. Additional attempts to train participants to improve their deepfake detection skills proved unsuccessful, further emphasizing the importance of developing reliable AI solutions for this purpose. The researchers plan to explore the potential of large language models capable of processing speech data in detecting deepfakes.

In conclusion, the study reveals the challenges in identifying AI-generated speech and highlights the necessity of robust deepfake detection technology to combat the risks associated with deepfake scams and misinformation.