통합 검색 | Korea Science

Japanese Speech Based Fuzzy Man-Machine Interface of Manipulators

Izumi, Kiyotaka;Watanabe, Keigo;Tamano, Yuya;Kiguchi, Kazuo
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2003년도 ICCAS
- /
- pp.603-608
- /
- 2003
Recently, personal robots and home robots are developing by many companies and research groups. It is considered that a general effective interface for user of those robots is speech or voice. In this paper, Japanese speech based man-machine interface system is discussed for reflecting the fuzziness of natural language on robots, by using fuzzy reasoning. The present system consists of the derivation part of action command and the modification part of the derived command. In particular, a unique problem of Japanese is solved by applying the morphological analyzer ChaSen. The proposed system is applied for the motion control of a robot manipulator. It is proved from the experimental results that the proposed system can easily modify the same voice command to the actual different levels of the command, according to the current state of the robot.
PDF

청각보철장치를 위한 어음 발췌기의 FPGA 구현 (FPGA Implementation of Speech Processor for Cochlear Implant)

박석준;홍민석;신중인;박상희
- 대한의용생체공학회:학술대회논문집
- /
- 대한의용생체공학회 1998년도 추계학술대회
- /
- pp.163-164
- /
- 1998
In this paper the digital speech processing part of cochlear implant for sensorineural disorderly patients is implemented and simulated. We implement the speech processing part by dividing into three small parts - Filterbank, Pitch Detect, and Bandmapping parts. With the result, we conclude digital speech processing algorithm is implemented in FPGA perfectly. This means that cochlear implant can be made very small size.
PDF

Implementation of Extracting Specific Information by Sniffing Voice Packet in VoIP

Lee, Dong-Geon;Choi, WoongChul
- International journal of advanced smart convergence
- /
- 제9권4호
- /
- pp.209-214
- /
- 2020
VoIP technology has been widely used for exchanging voice or image data through IP networks. VoIP technology, often called Internet Telephony, sends and receives voice data over the RTP protocol during the session. However, there is an exposition risk in the voice data in VoIP using the RTP protocol, where the RTP protocol does not have a specification for encryption of the original data. We implement programs that can extract meaningful information from the user's dialogue. The meaningful information means the information that the program user wants to obtain. In order to do that, our implementation has two parts. One is the client part, which inputs the keyword of the information that the user wants to obtain, and the other is the server part, which sniffs and performs the speech recognition process. We use the Google Speech API from Google Cloud, which uses machine learning in the speech recognition process. Finally, we discuss the usability and the limitations of the implementation with the example.
https://doi.org/10.7236/IJASC.2020.9.4.209 인용 PDF KSCI

Determining the Relative Differences of Emotional Speech Using Vocal Tract Ratio

Wang, Jianglin;Jo, Cheol-Woo
- 음성과학
- /
- 제13권1호
- /
- pp.109-116
- /
- 2006
In this paper, our study focuses on obtaining the differences of emotional speech in three different vocal tract sections. The vocal tract area was computed from the area function of the emotional speech. The total vocal tract was divided into 3 sections (vocal fold section, middle section and lip section) to acquire the differences in each vocal tract section of emotional speech. The experiment data include 6 emotional speeches from 3 males and 3 females. The 6 emotions consist of neutral, happiness, anger, sadness, fear and boredom. The measured difference is computed by the ratio through comparing each emotional speech with the normal speech. The experimental results present that there is not a remarkable difference at lip section, but the fear and sadness have a great change at the vocal fold part.
PDF

TOEFL의 듣기문항 분석을 통한 한국대학생 듣기 학습효과 (An Analysis of English Listening Items on the TOEFL)

차경환;유윤희
- 음성과학
- /
- 제7권2호
- /
- pp.157-175
- /
- 2000
The aim of this paper was to diagnose Korean college students' listening skills on the TOEFL. The researchers identified which section, among the TOEFL listening Part A, B, and C, is most easily teachable/ improvable during the period of a semester. First, the result of this research shows that Korean students tend to have lower scores in Part A than Part B or Part C. The results indicate that the short informal conversation doesn't give sufficient clues to students, and they don't have enough time to infer the answer. Second, the results revealed that. students showed the lowest progress in Part B after they studied TOEFL listening items and essential idioms for the listening section for 13 weeks. Because students didn't have much experience learning the informal conversation as opposed to the formal one in English, it is harder to achieve an improved grade in Part B, which consists of the informal conversation. But after a semester-long listening course, the average score on TOEFL listening sections increased.
PDF

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

Modi, Deepa;Nain, Neeta;Nehra, Maninder
- Journal of Multimedia Information System
- /
- 제5권3호
- /
- pp.147-154
- /
- 2018
Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.
https://doi.org/10.9717/JMIS.2018.5.3.147 인용 PDF KSCI

로컬 프레임 속도 변경에 의한 데이터 증강을 이용한 트랜스포머 기반 음성 인식 성능 향상 (Improving transformer-based speech recognition performance using data augmentation by local frame rate changes)

임성수;강병옥;권오욱
- 한국음향학회지
- /
- 제41권2호
- /
- pp.122-129
- /
- 2022
본 논문은 프레임 속도를 국부적으로 조절하는 데이터 증강을 이용하여 트랜스포머 기반 음성 인식기의 성능을 개선하는 방법을 제안한다. 먼저, 원래의 음성데이터에서 증강할 부분의 시작 시간과 길이를 랜덤으로 선택한다. 그 다음, 선택된 부분의 프레임 속도는 선형보간법을 이용하여 새로운 프레임 속도로 변경된다. 월스트리트 저널 및 LibriSpeech 음성데이터를 이용한 실험결과, 수렴 시간은 베이스라인보다 오래 걸리지만, 인식 정확도는 대부분의 경우에 향상됨을 보여주었다. 성능을 더욱 향상시키기 위하여 변경 부분의 길이 및 속도 등 다양한 매개변수를 최적화하였다. 제안 방법은 월스트리트 저널 및 LibriSpeech 음성 데이터에서 베이스라인과 비교하여 각각 11.8 % 및 14.9 %의 상대적 성능 향상을 보여주는 것으로 나타났다.
https://doi.org/10.7776/ASK.2022.41.2.122 인용 PDF KSCI

음성 부호기용 채널 부호화기의 구현 및 성능 분석 (Channel Coder Implementation and Performance Analysis for Speech Coding: Considering bit Importance of Speech Information-part III)

강법주;김선영;김상천;김영식
- 대한전자공학회논문지
- /
- 제27권4호
- /
- pp.484-490
- /
- 1990
In speech coding scheme, because information bits have different error sensitivities over channel errors, the channel coder for combining with speech coding should be realized by the variable coding rate considering the bit importance of speech information bits. In realizing the 4 kbps channel coder for 12kbps speech, this paper have chosen the channel coding method by analyzing the hard-decision post-decoding error rate of RCPC(Rate Compatible Punctured Convolutional) codes and bit error sensitivity of 12 kbps speech. Under the coherent QPSK and Rayleigh fading channel, the performance analysis has showed that 10dB gain was obtained in speech SEGSNR by 4-level uneuqal error protection, which was compared with the caseof no channel coding at 7dB channel SNR.
PDF

지지벡터기계(Support Vector Machines)를 이용한 한국어 화행분석 (An analysis of Speech Acts for Korean Using Support Vector Machines)

은종민;이성욱;서정연
- 정보처리학회논문지B
- /
- 제12B권3호
- /
- pp.365-368
- /
- 2005
본 연구에서는 지지 벡터 기계(Support Vector Machines)를 이용하여 한국어 대화의 화행을 분석하는 방법을 제안한다. 우리는 발화의 어휘 및 품사와 이진 품사 쌍을 문장 자질로 사용하고 이전 발화의 문맥을 문맥 발화로 사용한다. 카이 제곱 통계량을 이용해 적절한 자질을 선택하고 선택된 자질로 지지 벡터 기계를 학습하였다. 학습된 지지 벡터 기계 분류기를 이용하여 각 발화의 화행을 분석하였다. 호텔 예약 영역의 말뭉치에 대해 제안된 시스템을 이용하여 실험한 결과 약 $90.54\%$의 정확률을 얻었다.
https://doi.org/10.3745/KIPSTB.2005.12B.3.365 인용 PDF KSCI

지식 기반 프랑스어 발음열 생성 시스템 (A knowledge-based pronunciation generation system for French)

김선희
- 말소리와 음성과학
- /
- 제10권1호
- /
- pp.49-55
- /
- 2018
This paper aims to describe a knowledge-based pronunciation generation system for French. It has been reported that a rule-based pronunciation generation system outperforms most of the data-driven ones for French; however, only a few related studies are available due to existing language barriers. We provide basic information about the French language from the point of view of the relationship between orthography and pronunciation, and then describe our knowledge-based pronunciation generation system, which consists of morphological analysis, Part-of-Speech (POS) tagging, grapheme-to-phoneme generation, and phone-to-phone generation. The evaluation results show that the word error rate of POS tagging, based on a sample of 1,000 sentences, is 10.70% and that of phoneme generation, using 130,883 entries, is 2.70%. This study is expected to contribute to the development and evaluation of speech synthesis or speech recognition systems for French.
https://doi.org/10.13064/KSSS.2018.10.1.049 인용 PDF KSCI

검색결과 433건 처리시간 0.03초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)