Voice Segment Reduction using Perceiver Model

Choi, Yeon-Ung;Lee, Jae-Jun;Han, Hyeon-Taek;Lee, Hae-Yeoun;

doi:10.3745/PKIPS.y2022m05a.491

Proceedings of the Korea Information Processing Society Conference (한국정보처리학회:학술대회논문집)

2022.05a
/
Pages.491-493
/
2022
/
2005-0011(pISSN)
/
2671-7298(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Voice Segment Reduction using Perceiver Model

Perceiver 모델을 이용한 사용자 음성 구간 축약

Choi, Yeon-Ung (Dept. of Computer Software Engineering, Kumoh National Institute of Technology) ;
Lee, Jae-Jun (Dept. of Computer Software Engineering, Kumoh National Institute of Technology) ;
Han, Hyeon-Taek (Dept. of Computer Software Engineering, Kumoh National Institute of Technology) ;
Lee, Hae-Yeoun (Dept. of Computer Software Engineering, Kumoh National Institute of Technology)

최연웅 (금오공과대학교 컴퓨터소프트웨어공학과) ;
이재준 (금오공과대학교 컴퓨터소프트웨어공학과) ;
한현택 (금오공과대학교 컴퓨터소프트웨어공학과) ;
이해연 (금오공과대학교 컴퓨터소프트웨어공학과)

Published : 2022.05.17

https://doi.org/10.3745/PKIPS.y2022m05a.491 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

최근 스마트 기기에서 오디오 데이터를 이용하는 응용 기술들이 증가하면서, 오디오 데이터에서 관심 있는 구간을 찾아내는 기술의 필요성이 증가하고 있다. 본 논문에서는 Perceiver 모델을 활용하여 오디오 데이터에서 사람의 음성 구간을 검출하고 축약하는 방법을 제안한다. Perceiver 모델은 복잡한 입력 데이터에 대하여 Self-attention을 기반으로 특징을 추출하면서 이전의 특징을 다음 입력으로 다시 학습하는 특징을 갖고 있어서 연속적인 데이터인 오디오에 효율적으로 적용할 수 있다. 외부 및 자체에서 수집한 음성과 비음성 데이터셋에 대하여 실험을 진행하였고, 10초 단위 세그먼트에서 대해서 92.4%의 검출 정확도를 달성하였다.

Keywords

Acknowledgement

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1F1A1057742).

Proceedings of the Korea Information Processing Society Conference (한국정보처리학회:학술대회논문집)

Voice Segment Reduction using Perceiver Model

Perceiver 모델을 이용한 사용자 음성 구간 축약

Abstract

Keywords

Acknowledgement

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)