• Title/Summary/Keyword: Korean voice-dataset

Search Result 16, Processing Time 0.061 seconds

Study on the classification system of identification of the enemy in the military border area (군 경계지역에서 피아식별 분류 시스템 연구)

  • Junhyeong Lee;Hyun Kwon
    • Convergence Security Journal
    • /
    • v.24 no.3
    • /
    • pp.203-208
    • /
    • 2024
  • The identification and classification of victims in the county border area is one of the important issues. The personnel that can appear in the military border area are comprised of North Korean soldiers, U.S. soldiers, South Korean soldiers, and the general public, and are currently being confirmed through CCTV. They were classified into true categories and learned through transfer learning. The PyTorch machine learning library was used, and the dataset was utilized by crawling images corresponding to each item shared on Google. The experimental results show that each item is classified with an accuracy of 98.7500%. Future research will explore ways to distinguish more systematically and specifically by going beyond images and adding video or voice recognition.

Development of a Mobile Application for Disease Prediction Using Speech Data of Korean Patients with Dysarthria (한국인 구음장애 환자의 발화 데이터 기반 질병 예측을 위한 모바일 애플리케이션 개발)

  • Changjin Ha;Taesik Go
    • Journal of Biomedical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • Communication with others plays an important role in human social interaction and information exchange in modern society. However, some individuals have difficulty in communicating due to dysarthria. Therefore, it is necessary to develop effective diagnostic techniques for early treatment of the dysarthria. In the present study, we propose a mobile device-based methodology that enables to automatically classify dysarthria type. The light-weight CNN model was trained by using the open audio dataset of Korean patients with dysarthria. The trained CNN model can successfully classify dysarthria into related subtype disease with 78.8%~96.6% accuracy. In addition, the user-friendly mobile application was also developed based on the trained CNN model. Users can easily record their voices according to the selected inspection type (e.g. word, sentence, paragraph, and semi-free speech) and evaluate the recorded voice data through their mobile device and the developed mobile application. This proposed technique would be helpful for personal management of dysarthria and decision making in clinic.

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

Analyzing performance of time series classification using STFT and time series imaging algorithms

  • Sung-Kyu Hong;Sang-Chul Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • In this paper, instead of using recurrent neural network, we compare a classification performance of time series imaging algorithms using convolution neural network. There are traditional algorithms that imaging time series data (e.g. GAF(Gramian Angular Field), MTF(Markov Transition Field), RP(Recurrence Plot)) in TSC(Time Series Classification) community. Furthermore, we compare STFT(Short Time Fourier Transform) algorithm that can acquire spectrogram that visualize feature of voice data. We experiment CNN's performance by adjusting hyper parameters of imaging algorithms. When evaluate with GunPoint dataset in UCR archive, STFT(Short-Time Fourier transform) has higher accuracy than other algorithms. GAF has 98~99% accuracy either, but there is a disadvantage that size of image is massive.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

  • Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1122-1128
    • /
    • 2020
  • Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.