• Title/Summary/Keyword: Speaker characteristics

Search Result 257, Processing Time 0.021 seconds

Age classification of emergency callers based on behavioral speech utterance characteristics (발화행태 특징을 활용한 응급상황 신고자 연령분류)

  • Son, Guiyoung;Kwon, Soonil;Baik, Sungwook
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.6
    • /
    • pp.96-105
    • /
    • 2017
  • In this paper, we investigated the age classification from the speaker by analyzing the voice calls of the emergency center. We classified the adult and elderly from the call center calls using behavioral speech utterances and SVM(Support Vector Machine) which is a machine learning classifier. We selected two behavioral speech utterances through analysis of the call data from the emergency center: Silent Pause and Turn-taking latency. First, the criteria for age classification selected through analysis based on the behavioral speech utterances of the emergency call center and then it was significant(p <0.05) through statistical analysis. We analyzed 200 datasets (adult: 100, elderly: 100) by the 5 fold cross-validation using the SVM(Support Vector Machine) classifier. As a result, we achieved 70% accuracy using two behavioral speech utterances. It is higher accuracy than one behavioral speech utterance. These results can be suggested age classification as a new method which is used behavioral speech utterances and will be classified by combining acoustic information(MFCC) with new behavioral speech utterances of the real voice data in the further work. Furthermore, it will contribute to the development of the emergency situation judgment system related to the age classification.

Automatic Recognition of Pitch Accent Using Distributed Time-Delay Recursive Neural Network (분산 시간지연 회귀신경망을 이용한 피치 악센트 자동 인식)

  • Kim Sung-Suk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.277-281
    • /
    • 2006
  • This paper presents a method for the automatic recognition of pitch accents over syllables. The method that we propose is based on the time-delay recursive neural network (TDRNN). which is a neural network classifier with two different representation of dynamic context: the delayed input nodes allow the representation of an explicit trajectory F0(t) along time. while the recursive nodes provide long-term context information that reflects the characteristics of pitch accentuation in spoken English. We apply the TDRNN to pitch accent recognition in two forms: in the normal TDRNN. all of the prosodic features (pitch. energy, duration) are used as an entire set in a single TDRNN. while in the distributed TDRNN. the network consists of several TDRNNs each taking a single prosodic feature as the input. The final output of the distributed TDRNN is weighted sum of the output of individual TDRNN. We used the Boston Radio News Corpus (BRNC) for the experiments on the speaker-independent pitch accent recognition. π 1e experimental results show that the distributed TDRNN exhibits an average recognition accuracy of 83.64% over both pitch events and non-events.

The Measurement Algorithm for Microphone's Frequency Character Response Using OATSP (OATSP를 이용한 마이크로폰의 주파수 특성 응답 측정 알고리즘)

  • Park, Byoung-Uk;Kim, Hack-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.2
    • /
    • pp.61-68
    • /
    • 2007
  • The frequency response of a microphone, which indicates the frequency range that a microphone can output within the approved level, is one of the most significant standards used to measure the characteristics of a microphone. At present, conventional methods of measuring the frequency response are complicated and involve the use of expensive equipment. To complement the disadvantages, this paper suggests a new algorithm that can measure the frequency response of a microphone in a simple manner. The algorithm suggested in this paper generates the Optimized Aoshima's Time Stretched Pulse(OATSP) signal from a computer via a standard speaker and measures the impulse response of a microphone by convolution the inverse OATSP signal and the received by the microphone to be measured. Then, the frequency response of the microphone to be measured is calculated using the signals. The performance test for the algorithm suggested in the study was conducted through a comparative analysis of the frequency response data and the measures of frequency response of the microphone measured by the algorithm. It proved that the algorithm is suitable for measuring the frequency response of a microphone, and that despite a few errors they are all within the error tolerance.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

An Empirical Study Upon How Social Comparative Learning of Forum Participants Affects Learning Effects with Emphasis on Participants' Characteristic (포럼 참가자의 사회적 비교학습이 학습효과에 미치는 영향에 대한 실증분석: 참가자 특성을 중심으로)

  • Choi, Eunsoo;Kim, Chulwon
    • Knowledge Management Research
    • /
    • v.17 no.2
    • /
    • pp.131-163
    • /
    • 2016
  • The purpose of this study is to empirically analyze how social comparative learning of forum participants affects learning effects with an emphasis on participants' characteristics. As today's society is changing at a fast pace, the desire for new knowledge and information has grown accordingly. To quench this thirst for knowledge and information, seminars, symposiums, conferences, forums, conventions, exhibitions, and more are taking place as part of knowledge sharing events across the world. Also, the increased need for knowledge and information exchange has led the development and growth of the convention industry and Meetings, Incentives, Conferences, and Events (Exhibitions)(MICE) industry. Especially, forum is a type of event which invites professionals and specialists to discuss diverse topics and share their knowledge and experience with the audience. The participants utilize it as an opportunity to get close to information providers and enjoy the pleasure of knowledge exchange. However, there have been few empirical analyses on who the participants are, why they attend forum, how they pick up and learn new information and knowledge, and what kinds of learning effects they achieve after the event. This paper is to analyze how social comparative learning of the forum's participants influences learning effects based on Albert Bandura's Social Learning Theory (1977, 1997, 1982. 2001) and Leon Festinger's Social Comparative Theory (1950, 1954). By dividing the participants into two groups, one with high level of self-efficacy and the other with low level of self-efficacy, we have examined the differences in learning effects between the two groups using them as moderating variables. This study was conducted in 'MBN Y Forum 2016,' which is one of the most representative knowledge exchange forums of South Korea. An online survey was distributed out and, 1,307(39.2%) out of the total participants of 3,338 have completed the survey. The survey included questions about whether the participants have gained positive or negative motivations by comparing themselves to the speakers (upward comparison learning) and other participants (lateral comparison learning). The results have shown the quality of messages that the speakers are presenting as knowledge providers is the most significant factor that acts on learning effects. Particularly, the participants had higher levels of self-efficacy and self-esteem than average people. They had a clear goal to learn from the speakers (upward comparison) and received positive motivations from them. In other words, no negative learning effects had been found. This presents a managerial implication that having a qualified speaker is necessary for a forum to be successful. On the other hand, the results from the comparison with the other participants (lateral comparison) were different. The participants were likely to compare themselves to the other participants through observational learning. They could compare listening attitudes, language skills, or capabilities to ask a question. The results have showed the participants received positive motivations from the lateral group but at the same time were jealous of abilities of the others. When the quality of a question by a participant is not good enough, it can have a negative influence on the participants' learning effects. The first group with high levels of self-efficacy and self-esteem had no correlation to negative learning effects from the speakers. They rather had a strong desire to learn from the speakers. On the contrary, the participants perceived the lateral group as a learning subset and competitor. The second group with low levels of self-efficacy and self-esteem saw the quasi-group as a rival. This presents that the individual learning effects can be different depending on the participants' characteristics.

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF

Derivation of Constraint Factors Affecting Passenger's In-Vehicle Activity of Urban Air Mobility's Personal Air Vehicle and Design Criteria According to the Level of Human Impact (도심항공모빌리티 비행체 PAV 탑승자 실내행위에 영향을 미치는 제약 요소 도출 및 인체 영향 수준에 따른 설계 기준)

  • Jin, Seok-Jun;Oh, Young-Hoon;Ju, Da Young
    • Science of Emotion and Sensibility
    • /
    • v.25 no.1
    • /
    • pp.3-20
    • /
    • 2022
  • Recently, prior to the commercialization of urban air mobility (UAM), the importance of R&D for air transportation-related industries in urban areas has significantly increased. To create a UAM environment, research is being conducted on personal air vehicles (PAVs). They are key means of air transportation, but research on the physical factors influencing their passengers is relatively insufficient. In particular, because the PAV is expected to be used as a living space for the passengers, research on the effects of the physical elements generated in the PAV on the human body is essential to design an interior space that supports the in-vehicle activities of the passengers. Therefore, the purpose of this study is to derive the constraint factors that affect the human body due to the air navigation characteristics of the PAV and to understand the impact of these constraint factors on the bodies of the passengers performing in-vehicle activities. The results of this study indicate that when the PAV was operated at less than 4,000 ft, which is the operating standard, the constraint factors were noise, vibration, and motion sickness caused by low-frequency motion. These constraint factors affect in-vehicle activity; thus, the in-vehicle activities that can be performed in a PAV were derived using autonomous cars, airplanes, and PAV concept cases. Furthermore, considering the impact of the constraint factors and their levels on the human body, recommended constraint factor criteria to support in-vehicle activities were established. To reduce the level of impact of the constraint factors on the human body and to support in-vehicle activity, the seat's shape and built-in functions of the seat (vibration reduction function, temperature control, LED lighting, etc.) and external noise reduction using a directional speaker for each individual seat were recommended. Moreover, it was suggested that interior materials for noise and vibration reduction should be used in the design of the interior space. The contributions of this study are the determination of the constraint factors affecting the in-vehicle PAV activity and the confirmation of the level of impact of the factors on the human body; in the future, these findings can be used as basic data for suitable PAV interior design.