• Title/Summary/Keyword: Speech Database

Search Result 331, Processing Time 0.023 seconds

Design and Implementation of Vocal Interface-Inventory Management System (음성 인터페이스 기반의 재고 관리 시스템의 설계 및 구현)

  • Park Se Jin;Kwon Chul Hong
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.119-122
    • /
    • 2002
  • This paper focuses on building up a database of commercial stocks using XML syntax and looks into a way of building up a system with the combination of XML and XSLT that provides connectivity to client-server databases through vocal means. The use of XSLT has several advantages. Most importantly, it can transform a type of data into different formats. A vocal interface minimizes some space and time limits imposed on users outside premises when they need an instant connection to their database. In this fashion, the users can check information on stock lists without being pressurized by certain limits. PC, PDAs and cellular phones are some examples of mobile connection. The use of VoiceXML creates vocal applications. In VoiceXML servies, users can gain immediate access to data upon the input of their voices and the DTMF signals of the telephone.

  • PDF

A Study on Speaker Recognition Using MFCC Parameter Space (파마메터 공간을 이용한 화자인식에 관한 연구)

  • Lee Yong-woo;Lim dong-Chol;Lee Haing Sea
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.57-60
    • /
    • 2001
  • This paper reports on speaker-Recognition of context independence-speaker recognition in the field of the speech recognition. It is important to select the parameter reflecting the characteristic of each single person because speaker-recognition is to identify who speaks in the database. We used Mel Frequency Cesptrum Coefficient and Vector Quantization to identify in this paper. Specially, it considered to find characteristic-vector of the speaker in different from known method; this paper used the characteristic-vector which is selected in MFCC Parameter Space. Also, this paper compared the recognition rate according to size of codebook from this database and the time needed for operation with the existing one. The results is more improved $3\sim4\%$ for recognition rate than established Vector Quantization Algorithm.

  • PDF

Singing Voice Synthesis Using HMM Based TTS and MusicXML (HMM 기반 TTS와 MusicXML을 이용한 노래음 합성)

  • Khan, Najeeb Ullah;Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.5
    • /
    • pp.53-63
    • /
    • 2015
  • Singing voice synthesis is the generation of a song using a computer given its lyrics and musical notes. Hidden Markov models (HMM) have been proved to be the models of choice for text to speech synthesis. HMMs have also been used for singing voice synthesis research, however, a huge database is needed for the training of HMMs for singing voice synthesis. And commercially available singing voice synthesis systems which use the piano roll music notation, needs to adopt the easy to read standard music notation which make it suitable for singing learning applications. To overcome this problem, we use a speech database for training context dependent HMMs, to be used for singing voice synthesis. Pitch and duration control methods have been devised to modify the parameters of the HMMs trained on speech, to be used as the synthesis units for the singing voice. This work describes a singing voice synthesis system which uses a MusicXML based music score editor as the front-end interface for entry of the notes and lyrics to be synthesized and a hidden Markov model based text to speech synthesis system as the back-end synthesizer. A perceptual test shows the feasibility of our proposed system.

Improvement of Reliability based Information Integration in Audio-visual Person Identification (시청각 화자식별에서 신뢰성 기반 정보 통합 방법의 성능 향상)

  • Tariquzzaman, Md.;Kim, Jin-Young;Hong, Joon-Hee
    • MALSORI
    • /
    • no.62
    • /
    • pp.149-161
    • /
    • 2007
  • In this paper we proposed a modified reliability function for improving bimodal speaker identification(BSI) performance. The convectional reliability function, used by N. Fox[1], is extended by introducing an optimization factor. We evaluated the proposed method in BSI domain. A BSI system was implemented based on GMM and it was tested using VidTIMIT database. Through speaker identification experiments we verified the usefulness of our proposed method. The experiments showed the improved performance, i.e., the reduction of error rate by 39%.

  • PDF

Integrating Pronunciation into a Classroom and on the Web Courseware

  • Kim, He-Kyung
    • MALSORI
    • /
    • no.41
    • /
    • pp.49-59
    • /
    • 2001
  • The aim of this paper is to suggest possible methods of integrating the teaching of Pronunciation into a typical communicative classroom and consequently on English teaming web courseware. It is hoped that this paper will prompt teachers to understand the current role of pronunciation in communicative English programs and that the web technology can help students improve their pronunciation, which will develop their speaking and listening skills. It also suggests the need for a database of visualized communicative expressions.

  • PDF

Study on the song title query by humming melody information (허밍 운율정보를 이용한 곡목 검색 기술)

  • Lee Ji-Yeoun;Hahn Min-Soo
    • MALSORI
    • /
    • no.44
    • /
    • pp.131-143
    • /
    • 2002
  • Music query by humming is a challenging problem since the humming signal inevitably contains much variation and inaccuracy. In this paper, we suggest an algorithm for querying a wanted song from music database by humming its melody. In order to suit or adapt the inaccurate peoples humming, a new melody representation technique is proposed. Our algorithm is basically a pitch and duration information-based one and performs fairly well. 85% of correct query rate of the song is achieved for the top 3 matches when tested with 20 songs.

  • PDF

Text-dependent Speaker Verification System Over Telephone Lines (전화망을 위한 어구 종속 화자 확인 시스템)

  • 김유진;정재호
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.663-667
    • /
    • 1999
  • In this paper, we review the conventional speaker verification algorithm and present the text-dependent speaker verification system for application over telephone lines and its result of experiments. We apply blind-segmentation algorithm which segments speech into sub-word unit without linguistic information to the speaker verification system for training speaker model effectively with limited enrollment data. And the World-mode] that is created from PBW DB for score normalization is used. The experiments are presented in implemented system using database, which were constructed to simulate field test, and are shown 3.3% EER.

  • PDF

International Cooperative Works Preparing for Speech Database (음성DB 구축을 위한 국제간 활동현황 (COCOSDA '94에서 발표된 내용을 중심으로))

  • 조철우
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.261-264
    • /
    • 1995
  • 최근 음성처리기술의 정교화, 고도화를 위해서 대량의 다양한 데이터베이스가 필요하게 되었고, 또 자동통역전화등 국제적 연결을 위한 응용분야가 개발됨에 따라 자국이 언어가 아닌 다른 나라의 음성에 관한 데이터베이스가 필요하게 되었다. 이에 따라 자연히 필요한 데이터베이스의 규격이나 종류등의 상호 공동 관심사를 논의할 필요가 있게 되었고 이의논의를 위한 워크샵등의 모임이 형성되게 되었다. 본 고에서는 이러한 모임중의 대표격인 COCOSDA의 활동에 관하여 언급하고 우리나라에서 음성데이타베이스분야에 관련하여 관심을 기울여야할 부분에 대하여 열거해 보았다.

  • PDF

Spectral Feature Transformation for Compensation of Microphone Mismatches

  • Jeong, So-Young;Oh, Sang-Hoon;Lee, Soo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4E
    • /
    • pp.150-154
    • /
    • 2003
  • The distortion effects of microphones have been analyzed and compensated at mel-frequency feature domain. Unlike popular bias removal algorithms a linear transformation of mel-frequency spectrum is incorporated. Although a diagonal matrix transformation is sufficient for medium-quality microphones, a full-matrix transform is required for low-quality microphones with severe nonlinearity. Proposed compensation algorithms are tested with HTIMIT database, which resulted in about 5 percents improvements in recognition rate over conventional CMS algorithm.

An Extraction of the Prosody and Duration Information for Speech Synthesis in Korean (한국어 음성 합성을 위한 운율 및 길이 정보의 추출)

  • 양진석;박광철;양세라;김재범;이정현
    • Proceedings of the Korea Database Society Conference
    • /
    • 1995.12a
    • /
    • pp.187-190
    • /
    • 1995
  • 자연스러운 음성 합성을 위해서는 운율 및 장단음 처리가 선행되어야 한다. 본 논문에서는 음성학적 실험과 억양 규칙을 이용하여 한국어 문장으로부터 운율 제어 정보와 모음 길이 정보를 추출한 뒤 음성 합성에 적용함으로써 합성음의 자연성을 향상시키는 방법을 제안한다. 이러한 정보는 문장 분석 후 일련의 운율 규칙을 적용하여 반복된 실험을 통해 수치화함으로써 얻을 수 있었다. 실험결과, 운율 및 장단처리를 적용한 본 시스템에서는 자연성이 향상된 결과를 얻을 수 있었다.

  • PDF