• Title/Summary/Keyword: Speech/non-speech classification

Search Result 37, Processing Time 0.021 seconds

AI Advisor for Response of Disaster Safety in Risk Society (위험사회 재난 안전 분야 대응을 위한 AI 조력자)

  • Lee, Yong-Hak;Kang, Yunhee;Lee, Min-Ho;Park, Seong-Ho;Kang, Myung-Ju
    • Journal of Platform Technology
    • /
    • v.8 no.3
    • /
    • pp.22-29
    • /
    • 2020
  • The 4th industrial revolution is progressing by country as a mega trend that leads various technological convergence directions in the social and economic fields from the initial simple manufacturing innovation. The epidemic of infectious diseases such as COVID-19 is shifting digital-centered non-face-to-face business from economic operation, and the use of AI and big data technology for personalized services is essential to spread online. In this paper, we analyze cases focusing on the application of artificial intelligence technology, which is a key technology for the effective implementation of the digital new deal promoted by the government, as well as the major technological characteristics of the 4th industrial revolution and describe the use cases in the field of disaster response. As a disaster response use case, AI assistants suggest appropriate countermeasures according to the status of the reporter in an emergency call. To this end, AI assistants provide speech recognition data-based analysis and disaster classification of converted text for adaptive response.

  • PDF

Dialogue Act Classification for Non-Task-Oriented Korean Dialogues (도메인에 비종속적인 대화에서의 화행 분류)

  • Kim, Min-Jeong;Han, Kyoung-Soo;Park, Jae-Hyun;Song, Young-In;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.246-253
    • /
    • 2006
  • 대화 에이전트와 관련된 지금까지의 연구는 대개 대상 도메인을 한정하고, 특정 목적을 달성하기 위해 사용자와 대화할 수 있는 에이전트에 관한 연구가 많았다. 본 연구에서는 도메인이 한정되지 않은 일반 도메인 대화에서 화행(speech act)정보를 수동으로 부착시켜 구축한 말뭉치에 대해 소개하고 이 말뭉치를 토대로 자동으로 화행을 분류할 수 있는 유용한 자질들을 선보인다. 그리고 도메인이 한정된 말뭉치와 도메인이 한정되지 않은 말뭉치를 자동으로 화행분류해 본 실험한 결과를 비교하였다.

  • PDF

Voice Activity Detection Based on Discriminative Weight Training with Feedback (궤환구조를 가지는 변별적 가중치 학습에 기반한 음성검출기)

  • Kang, Sang-Ick;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.443-449
    • /
    • 2008
  • One of the key issues in practical speech processing is to achieve robust Voice Activity Deteciton (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ equally weighted likelihood ratios (LRs), which, however, deviates from the real observation. Furthermore voice activities in the adjacent frames have strong correlation. In other words, the current frame is highly correlated with previous frame. In this paper, we propose the effective VAD approach based on a minimum classification error (MCE) method which is different from the previous works in that different weights are assigned to both the likelihood ratio on the current frame and the decision statistics of the previous frame.

Extracting Rules from Neural Networks with Continuous Attributes (연속형 속성을 갖는 인공 신경망의 규칙 추출)

  • Jagvaral, Batselem;Lee, Wan-Gon;Jeon, Myung-joong;Park, Hyun-Kyu;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.22-29
    • /
    • 2018
  • Over the decades, neural networks have been successfully used in numerous applications from speech recognition to image classification. However, these neural networks cannot explain their results and one needs to know how and why a specific conclusion was drawn. Most studies focus on extracting binary rules from neural networks, which is often impractical to do, since data sets used for machine learning applications contain continuous values. To fill the gap, this paper presents an algorithm to extract logic rules from a trained neural network for data with continuous attributes. It uses hyperplane-based linear classifiers to extract rules with numeric values from trained weights between input and hidden layers and then combines these classifiers with binary rules learned from hidden and output layers to form non-linear classification rules. Experiments with different datasets show that the proposed approach can accurately extract logical rules for data with nonlinear continuous attributes.

Drone Location Tracking with Circular Microphone Array by HMM (HMM에 의한 원형 마이크로폰 어레이 적용 드론 위치 추적)

  • Jeong, HyoungChan;Lim, WonHo;Guo, Junfeng;Ahmad, Isitiaq;Chang, KyungHi
    • Journal of Advanced Navigation Technology
    • /
    • v.24 no.5
    • /
    • pp.393-407
    • /
    • 2020
  • In order to reduce the threat by illegal unmanned aerial vehicles, a tracking system based on sound was implemented. There are three main points to the drone acoustic tracking method. First, it scans the space through variable beam formation to find a sound source and records the sound using a microphone array. Second, it classifies it into a hidden Markov model (HMM) to find out whether the sound source exists or not, and finally, the sound source is In the case of a drone, a sound source recorded and stored as a tracking reference signal based on an adaptive beam pattern is used. The simulation was performed in both the ideal condition without background noise and interference sound and the non-ideal condition with background noise and interference sound, and evaluated the tracking performance of illegal drones. The drone tracking system designed the criteria for determining the presence or absence of a drone according to the improvement of the search distance performance according to the microphone array performance and the degree of sound pattern matching, and reflected in the design of the speech reading circuit.

Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal (성대신호 기반의 명령어인식기를 위한 특징벡터 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.3
    • /
    • pp.226-234
    • /
    • 2007
  • In this paper, we develop a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. However, because of the absence of high frequencies and the partially loss of formant frequencies, previous systems developed with those devices have shown a lower recognition rate than systems which use standard microphone signals. This problem has led to researchers using throat microphone signals as supplementary data sources supporting standard microphone signals. In this paper, we present a high performance ASR system which we developed using only a throat microphone by taking advantage of Korean Phonological Feature Theory and a detailed throat signal analysis. Analyzing the spectrum and the result of FFT of the throat microphone signal, we find that the conventional MFCC feature vector that uses a critical pass filter does not characterize the throat microphone signals well. We also describe the conditions of the feature extraction algorithm which make it best suited for throat microphone signal analysis. The conditions involve (1) a sensitive band-pass filter and (2) use of feature vector which is suitable for voice/non-voice classification. We experimentally show that the ZCPA algorithm designed to meet these conditions improves the recognizer's performance by approximately 16%. And we find that an additional noise-canceling algorithm such as RAST A results in 2% more performance improvement.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.