• Title/Summary/Keyword: AASIST

Search Result 2, Processing Time 0.016 seconds

CoNSIST: Consist of New Methodologies on AASIST for Audio Deepfake Detection (컨시스트: 오디오 딥페이크 탐지를 위한 그래프 어텐션 기반 새로운 모델링 방법론 연구)

  • Jae Hoon Ha;Joo Won Mun;Sang Yup Lee
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.10
    • /
    • pp.513-519
    • /
    • 2024
  • Advancements in artificial intelligence(AI) have significantly improved deep learning-based audio deepfake technology, which has been exploited for criminal activities. To detect audio deepfake, we propose CoNSIST, an advanced audio deepfake detection model. CoNSIST builds on AASIST, which a graph-based end-to-end model, by integrating three key components: Squeeze and Excitation, Positional Encoding, and Reformulated HS-GAL. These additions aim to enhance feature extraction, eliminate unnecessary operations, and incorporate diverse information. Our experimental results demonstrate that CoNSIST significantly outperforms existing models in detecting audio deepfakes, offering a more robust solution to combat the misuse of this technology.

CoNSIST : Consist of New methodologies on AASIST, leveraging Squeeze-and-Excitation, Positional Encoding, and Re-formulated HS-GAL

  • Jae-Hoon Ha;Joo-Won Mun;Sang-Yup Lee
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.692-695
    • /
    • 2024
  • With the recent advancements in artificial intelligence (AI), the performance of deep learning-based audio deepfake technology has significantly improved. This technology has been exploited for criminal activities, leading to various cases of victimization. To prevent such illicit outcomes, this paper proposes a deep learning-based audio deepfake detection model. In this study, we propose CoNSIST, an improved audio deepfake detection model, which incorporates three additional components into the graph-based end-to-end model AASIST: (i) Squeeze and Excitation, (ii) Positional Encoding, and (iii) Reformulated HS-GAL, This incorporation is expected to enable more effective feature extraction, elimination of unnecessary operations, and consideration of more diverse information, thereby improving the performance of the original AASIST. The results of multiple experiments indicate that CoNSIST has enhanced the performance of audio deepfake detection compared to existing models.