Abstract | Audio deepfakes are becoming increasingly prevalent, posing a threat to companies who may fall victim to CEO fraud, as well as individuals who may be targeted by voice scams. As a result, the need for audio deepfake detectors is growing. While academic research on detectors exists, the papers can be difficult for non-experts to understand. Therefore, this article provides an introduction to various state-of-the-art detectors for non-experts. Besides background information about audio and deep learning, evaluation metrics, and commonly used datasets for training detectors such as ASVspoof are explored. Detectors that use handcrafted features and end-to-end models such as RawNet2 and AASIST are presented understandable for beginners. Additionally, models based on self-supervised learning, which currently provide the best results on in-the-wild data are introduced. |
---|