• Title/Summary/Keyword: 화자 패턴

Search Result 111, Processing Time 0.025 seconds

A Study on the Algorithm Development for Speech Recognition of Korean and Japanese (한국어와 일본어의 음성 인식을 위한 알고리즘 개발에 관한 연구)

  • Lee, Sung-Hwa;Kim, Hyung-Lae
    • Journal of IKEEE
    • /
    • v.2 no.1 s.2
    • /
    • pp.61-67
    • /
    • 1998
  • In this thesis, experiment have performed with the speaker recognition using multilayer feedforward neural network(MFNN) model using Korean and Japanese digits . The 5 adult males and 5 adult females pronounciate form 0 to 9 digits of Korean, Japanese 7 times. And then, they are extracted characteristics coefficient through Pitch deletion algorithm, LPC analysis, and LPC Cepstral analysis to generate input pattern of MFNN. 5 times among them are used to train a neural network, and 2 times is used to measure the performance of neural network. Both Korean and Japanese, Pitch coefficients is about 4%t more enhanced than LPC or LPC Cepstral coefficients.

  • PDF

A Study on the Recognition of Korean 4 Connected Digits Considering Co-articulation (조음결합을 고려한 4연 숫자음 인식에 관한 연구)

  • 이종진;이광석;허강인;김명기;고시영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.1
    • /
    • pp.20-28
    • /
    • 1992
  • Co-articulation is one of major factors that make connected word recognition difficult. This Study Considers the fact that the head Part Of the following word is changed by the Preceding word in a connection point, by applying the co-articulation model, and adj usting the following word .We choose a critical damping second order linear system for the co-articulation model, combining a one-stage DP matching recognition algorithm with this model, and Investigating the effects. The recognition experiment is carried out for 35 Korean 4 connected digits spoken by 5 male speakers, and recognition rate Is upgraded by 4.7 percent.

  • PDF

The Characteristics and Significance of 'Nim' Texts in the Late Chason Period: Focused on Saseol-sijo and Chap-ga (조선후기 '님' 담론의 특성과 그 의미 : 사설시조와 잡가를 중심으로)

  • Shin Eun-Kyung
    • Sijohaknonchong
    • /
    • v.20
    • /
    • pp.113-139
    • /
    • 2004
  • This article intends to illuminate how the men. leading agents in Saseol-sijo - musical performers. writers of lyrics, patrons. composers. compilers of Sijo anthologies, audience. etc. - In the Late Choson period, viewed or recognized women and how their understanding of women was reflected in the texts. Working with texts with the theme of 'Love,' this article starts with categorizing two types of love: the first type, 'lovelorn heart' focusing on unilateral pining for a single lover who is absent now and the second type. 'physical love' concentrating on bilateral sexual intercourse. In addition to the types of love, the gender of poetic speakers, distinct from real poets is vital to characterize the discourse of love. According to these two factors. texts in question fall into four groups: texts that a female speaker displays her lovelorn heart('Type 1'), those where she speaks about her sexual experiences('Type 2'), those where a male speaker sings his lovelorn heart('Type 3'), and those where he describes his sexual experiences('Type 4'). Of these. 'Type 2' and 'Type 3' are key to understanding of the men's view of women. With respect to the configuration of the theme of 'Love,' it should be noted that in Korean literary history, the nim or a 'sweetheart' had signified the totality of value or a perfect entity which makes one's life meaningful and that 'Type 1,' the pattern that a female subject expresses her love toward male min, had constituted a traditional way to convey the theme of 'Love.' In terms of this connotation of min. a remarkable increase of 'Type 3' implying the increase of male speakers, reveals the extent to which women, the male speakers' min, accomplished their entry into a 'sacred area' -the position of mm-in which only men had occupied; females are focused and centralized. This article considers this phenomenon as an exhibition of the upgrade of women's significance and weight in the Late Choson society and as an index of 'modernity.' Meanwhile, given that most of the Saseol-sijo poets are men, the emergence of the 'Type 2' texts in which male poets have female speakers disclose their sexual experiences, demonstrates a representative example that women are degraded to be a means of men's pleasure; for this situation gives men more pleasure than when male speakers reveal their sexual experiences. Not only 'Type 2,' but texts group which basically belongs to 'Type I' and conveys the theme of 'Loyalty' through the female voice by substituting rulers-subjects relation for men-women relation, also falls under the same case. For men employ female voice as a poetic device in order to stress the theme of 'Loyalty' This article regards this phenomenon as an index of 'pre-modernity,' in the sense that in a pre-modem society, specifically in Early Choson, male-oriented value system dominates, thereby alienating women. As it is well known, the Late Choson is marked by a transitional period from a pre-modem society to a modem society. Therefore the ambivalence of the premodern and the modem can be found mixed in every segment of the society. The dual aspects of the masculine view of women in Saseol-sijo constitutes one example. The significance of the Saseol-sijo in Korean literary history can be found in this phenomenon.

  • PDF

A Study on the Improvement of Isolated Word Recognition for Telephone Speech (전화음성의 격리단어인식 개선에 관한 연구)

  • Do, Sam-Joo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.9 no.4
    • /
    • pp.66-76
    • /
    • 1990
  • In this work, the effect of noise and distortion of a telephone channel on the speech recognition is studied, and methods to improve the recognition rate are proposed. Computer simulation is done using the 100-word test data whichwere made by pronouncing ten times 100-phonetically balanced Korean isolated words in a speaker dependent mode. First, a spectral subtraction method is suggested to improve the noisy speech recognition. Then, the effect of bandwidth limiting and channel distortion is studied. It has been found that bandwidth limiting and amplitude distortion lower the recognition rate significantly, but phase distortion affects little. To reduce the channel effect, we modify the reference pattern according to some training data. When both channel noise and distortion exist, the recognition rate without the proposed method is merely 7.7~26.4%, but the recognition rate with the proposed method is drastically increased to 76.2~92.3%.

  • PDF

Implementation of the Speech Emotion Recognition System in the ARM Platform (ARM 플랫폼 기반의 음성 감성인식 시스템 구현)

  • Oh, Sang-Heon;Park, Kyu-Sik
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.11
    • /
    • pp.1530-1537
    • /
    • 2007
  • In this paper, we implemented a speech emotion recognition system that can distinguish human emotional states from recorded speech captured by a single microphone and classify them into four categories: neutrality, happiness, sadness and anger. In general, a speech recorded with a microphone contains background noises due to the speaker environment and the microphone characteristic, which can result in serious system performance degradation. In order to minimize the effect of these noises and to improve the system performance, a MA(Moving Average) filter with a relatively simple structure and low computational complexity was adopted. Then a SFS(Sequential Forward Selection) feature optimization method was implemented to further improve and stabilize the system performance. For speech emotion classification, a SVM pattern classifier is used. The experimental results indicate the emotional classification performance around 65% in the computer simulation and 62% on the ARM platform.

  • PDF

Growth Pattern and Phenology of Mankyua chejuense B.Y. Sun, M.H. Kim & C.H. Kim (제주고사리삼(Mankyua chejuense B.-Y. Sun, M.H. Kim & C.H. Kim)의 생장 형태 및 식물계절학적 특성)

  • Hyun, Hwa-Ja;Moon, Myung-Ok;Choi, Hyung-Soon;Kim, Chan-Soo
    • Korean Journal of Plant Resources
    • /
    • v.27 no.1
    • /
    • pp.95-101
    • /
    • 2014
  • This study was conducted to analyze the pattern of growth and phenological characteristics of Mankyua chejuense B.-Y. Sun, M.H. Kim & C.H. Kim, which belong to the Ophioglossaceae family. M. chejuense asexually reproduced using rhizomes, and a clone of the species grew up to 52 cm based on root growth and new leaves was produced from rhizomes. The development of leaves were divided into four stages; leaf emergence-separation and growth of leaf-sporophyll maturation-senescence. Most leaves emerged in July and August with high temperatures and precipitation and most leaves reached full expansion during September-October and died during April-May next year. The life span of leave was 10 months, from July to April, and the start of leaf senescence varied depending on the habitat environment and this might result from micro environmental differences among the habitats.

The usefulness of the depth images in image-based speech synthesis (영상 기반 음성합성에서 심도 영상의 유용성)

  • Ki-Seung Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.1
    • /
    • pp.67-74
    • /
    • 2023
  • The images acquired from the speaker's mouth region revealed the unique patterns according to the corresponding voices. By using this principle, the several methods were proposed in which speech signals were recognized or synthesized from the images acquired at the speaker's lower face. In this study, an image-based speech synthesis method was proposed in which the depth images were cooperatively used. Since depth images yielded depth information that cannot be acquired from optical image, it can be used for the purpose of supplementing flat optical images. In this paper, the usefulness of depth images from the perspective of speech synthesis was evaluated. The validation experiment was carried out on 60 Korean isolated words, it was confirmed that the performance in terms of both subjective and objective evaluation was comparable to the optical image-based method. When the two images were used in combination, performance improvements were observed compared with when each image was used alone.

A Study on Robust Speech Emotion Feature Extraction Under the Mobile Communication Environment (이동통신 환경에서 강인한 음성 감성특징 추출에 대한 연구)

  • Cho Youn-Ho;Park Kyu-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.269-276
    • /
    • 2006
  • In this paper, we propose an emotion recognition system that can discriminate human emotional state into neutral or anger from the speech captured by a cellular-phone in real time. In general. the speech through the mobile network contains environment noise and network noise, thus it can causes serious System performance degradation due to the distortion in emotional features of the query speech. In order to minimize the effect of these noise and so improve the system performance, we adopt a simple MA (Moving Average) filter which has relatively simple structure and low computational complexity, to alleviate the distortion in the emotional feature vector. Then a SFS (Sequential Forward Selection) feature optimization method is implemented to further improve and stabilize the system performance. Two pattern recognition method such as k-NN and SVM is compared for emotional state classification. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance such as 86.5%. so that it will be very useful in application areas such as customer call-center.

Korean Phoneme Recognition Using Self-Organizing Feature Map (SOFM 신경회로망을 이용한 한국어 음소 인식)

  • Jeon, Yong-Koo;Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.101-112
    • /
    • 1995
  • In order to construct a feature map-based phoneme classification system for speech recognition, two procedures are usually required. One is clustering and the other is labeling. In this paper, we present a phoneme classification system based on the Kohonen's Self-Organizing Feature Map (SOFM) for clusterer and labeler. It is known that the SOFM performs self-organizing process by which optimal local topographical mapping of the signal space and yields a reasonably high accuracy in recognition tasks. Consequently, SOFM can effectively be applied to the recognition of phonemes. Besides to improve the performance of the phoneme classification system, we propose the learning algorithm combined with the classical K-mans clustering algorithm in fine-tuning stage. In order to evaluate the performance of the proposed phoneme classification algorithm, we first use totaly 43 phonemes which construct six intra-class feature maps for six different phoneme classes. From the speaker-dependent phoneme classification tests using these six feature maps, we obtain recognition rate of $87.2\%$ and confirm that the proposed algorithm is an efficient method for improvement of recognition performance and convergence speed.

  • PDF

Improvement of Character-net via Detection of Conversation Participant (대화 참여자 결정을 통한 Character-net의 개선)

  • Kim, Won-Taek;Park, Seung-Bo;Jo, Geun-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.241-249
    • /
    • 2009
  • Recently, a number of researches related to video annotation and representation have been proposed to analyze video for searching and abstraction. In this paper, we have presented a method to provide the picture elements of conversational participants in video and the enhanced representation of the characters using those elements, collectively called Character-net. Because conversational participants are decided as characters detected in a script holding time, the previous Character-net suffers serious limitation that some listeners could not be detected as the participants. The participants who complete the story in video are very important factor to understand the context of the conversation. The picture elements for detecting the conversational participants consist of six elements as follows: subtitle, scene, the order of appearance, characters' eyes, patterns, and lip motion. In this paper, we present how to use those elements for detecting conversational participants and how to improve the representation of the Character-net. We can detect the conversational participants accurately when the proposed elements combine together and satisfy the special conditions. The experimental evaluation shows that the proposed method brings significant advantages in terms of both improving the detection of the conversational participants and enhancing the representation of Character-net.