• 제목/요약/키워드: Voice signal

검색결과 431건 처리시간 0.027초

피치 검출을 위한 스펙트럼 평탄화 기법 (Flattening Techniques for Pitch Detection)

  • 김종국;조왕래;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Signal Analysis and Performance Evaluation of the PCMA System based on the QFT

  • Kim, Jeng-Sik;Min, Seung-Gi;Jang, Jun-Hwan;Jeng, Hae-Young;Yoon, Dal-Hwan
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 ITC-CSCC -1
    • /
    • pp.84-87
    • /
    • 2000
  • The system has a function of acquirement PCM signal of the preferred channel from the subhighway (SHW), connecting a universal signal transceiver unit and time switch unit, and then it classifies the type of signal such as R2MFC/ DTMF/ CCT/ VOICE, and finally discriminates the digit. This paper describes the spectral analysis of the PCM acquisition system usng the quick Fourier transform(QFT), and discusses the algorithm of signal analysis and discrimination.

  • PDF

켑스트럼 파라미터를 이용한 후두암 검진 (Laryngeal Cancer Screening using Cepstral Parameters)

  • 이원범;전경명;권순복;전계록;김수미;김형순;양병곤;조철우;왕수건
    • 대한후두음성언어의학회지
    • /
    • 제14권2호
    • /
    • pp.110-116
    • /
    • 2003
  • Background and Objectives : Laryngeal cancer discrimination using voice signals is a non-invasive method that can carry out the examination rapidly and simply without giving discomfort to the patients. n appropriate analysis parameters and classifiers are developed, this method can be used effectively in various applications including telemedicine. This study examines voice analysis parameters used for laryngeal disease discrimination to help discriminate laryngeal diseases by voice signal analysis. The study also estimates the laryngeal cancer discrimination activity of the Gaussian mixture model (GMM) classifier based on the statistical modelling of voice analysis parameters. Materials and Methods : The Multi-dimensional voice program (MDVP) parameters, which have been widely used for the analysis of laryngeal cancer voice, sometimes fail to analyze the voice of a laryngeal cancer patient whose cycle is seriously damaged. Accordingly, it is necessary to develop a new method that enables an analysis of high reliability for the voice signals that cannot be analyzed by the MDVP. To conduct the experiments of laryngeal cancer discrimination, the authors used three types of voices collected at the Department of Otorhinorlaryngology, Pusan National University Hospital. 50 normal males voice data, 50 voices of males with benign laryngeal diseases and 105 voices of males laryngeal cancer. In addition, the experiment also included 11 voices data of males with laryngeal cancer that cannot be analyzed by the MDVP, Only monosyllabic vowel /a/ was used as voice data. Since there were only 11 voices of laryngeal cancer patients that cannot be analyzed by the MDVP, those voices were used only for discrimination. This study examined the linear predictive cepstral coefficients (LPCC) and the met-frequency cepstral coefficients (MFCC) that are the two major cepstrum analysis methods in the area of acoustic recognition. Results : The results showed that this met frequency scaling process was effective in acoustic recognition but not useful for laryngeal cancer discrimination. Accordingly, the linear frequency cepstral coefficients (LFCC) that excluded the met frequency scaling from the MFCC was introduced. The LFCC showed more excellent discrimination activity rather than the MFCC in predictability of laryngeal cancer. Conclusion : In conclusion, the parameters applied in this study could discriminate accurately even the terminal laryngeal cancer whose periodicity is disturbed. Also it is thought that future studies on various classification algorithms and parameters representing pathophysiology of vocal cords will make it possible to discriminate benign laryngeal diseases as well, in addition to laryngeal cancer.

  • PDF

일측 성대마비 환자에서 성대내전술 후 성대접촉율의 증가가 음질 개선에 미치는 영향 (The Effect of An Increase of Closed Quotient on Improvement of Voice Quality after Type I Thyroplasty in Patients with Unilateral Vocal Cord Paralysis)

  • 김한수;최성희;임재열;최홍식
    • 대한후두음성언어의학회지
    • /
    • 제15권1호
    • /
    • pp.16-20
    • /
    • 2004
  • Purpose : To assess perceptual, acoustic and aerodynamic measure of voice quality in patients with unilateral vocal cord paralysis before and after type I thyroplasty. Methods : The clinical records of patients operated type I thyroplasty in the Departement of otorhinoalryngolgy, Yongdong Severance hospital from November 2001 to November 2003 were reviewed. All patients uderwent a vocal function evaluation including perceptual, acoustic and aerodynamic measures of voice preoperative and on $60^{th}$ postoperative day. The perceptual and acoustic measures were obtained from recording of patients' reading a 'Sanchak' passage. The perceptual evaluation was performed by 2 speech pathologist using a 4-point rating scale. Acoustic parameters(voice range profile low(RAL), voice range profile high(RAH), average fundamental frequency(AFX), closed quotient, harmonic to noise ratio, jitter and shimmer) were investigated by Lx speech studio. Mean flow rate(MFR), subglottic pressure(Psub) and intensity were measured using the Phonatory function analyzer. The maximum phonation time was also measured. The data were statistically analyzed. A paired t-test (p<0.1) was used to compare preoperative and postoperative results. And multiple regression test was used to find which parameter was most correlated to improvement of postoperative voice quality. Results : Among aerodynamic parameters, Psub $(88.11mmH_2O{\rightarrow}58.7mmH_2O)$, MPT(7.87sec${\rightarrow}$12.53sec), MFR (359.8ml/sec${\rightarrow}$161.06ml/sec) were statistically improved. AFx(205.5Hz${\rightarrow}$163.27Hz), AQx(23.9%${\rightarrow}$48.3%), RAL, RAH. Jotter and shimmer were improved. In multiple regression test, AFx and AQx was noted as the two meost correlated parameters to improvement of postoperative breathiness. But general grade of voice quality was more correlated to Psub and shimmer. Conclusion : Vocal fold medialization procedures effectively reduce glottic gap. Increasing of contact area of both vocal folds induced improvement in aerodynamic parameters and leaded stabilizing of vocal fold vibration. That effect results in improvement in acoustic parameters (shimmer, jitter, signal-to-noise ratio, voice range profile) and voice quality.

  • PDF

8kbps에 있어서 ACFBD-MPC에 관한 연구 (A Study on ACFBD-MPC in 8kbps)

  • 이시우
    • 한국산학기술학회논문지
    • /
    • 제17권7호
    • /
    • pp.49-53
    • /
    • 2016
  • 최근 무선네트워크의 효율을 높이기 위하여 신호압축 방식의 사용이 증가되고 있다. 특히, MPC 시스템은 비트율을 줄이기 위하여 피치추출 방법과 유성음과 무성음의 음원을 사용하였다. 일반적으로, 유성음원과 무성음원을 사용하는 MPC 시스템에 있어서, 같은 프레임 안에 모음과 무성자음이 있는 경우에 재생 음성파형에 일그러짐이 나타난다. 이것은 대표구간의 멀티펄스를 피치구간마다 복원하는 과정에서 재생 음성파형이 정규화 되는 것이 원인으로 작용한다. 본 논문에서는 재생 음성파형의 일그러짐을 제어하기 위하여 피치구간 마다 멀티펄스의 진폭을 보정하고, 특정 주파수를 이용하는 ACFBD-MPC(Amplitude Compensation Frequency Band Division-Multi Pulse Coding)를 제안하였다. 실험은 남자와 여자음성에서 각각 16개의 문장을 사용하였으며, 음성신호는 10kHz 12bit로 A/D 변환하였다. 또한 8kbps의 부호화 조건에서 ACFBD-MPC 시스템을 구현하고, ACFBD-MPC의 SNR를 평가하였다. 그 결과 ACFBD-MPC의 남자 음성에서 14.2dB, 여자 음성에서 13.6dB 임을 확인할 수 있었으며, ACFBD-MPC가 기존의 MPC에 비하여 남자음성에서 1dB, 여자음성에서 0.9dB 개선되는 것을 알 수 있었다. 이 방법은 셀룰러폰이나 스마트폰과 같이 낮은 비트율의 음원을 사용하여 음성신호를 부호화하는 방식에 활용할 수 있을 것으로 기대된다.

사설 PSTN에서 2W 전화 신호의 이더넷 변환 프로토콜 (A Conversion Protocol for 2W Telephone Signal over Ethernet in a Private PSTN)

  • 신진범;조길석;이동관;김태현
    • 한국군사과학기술학회지
    • /
    • 제24권6호
    • /
    • pp.645-654
    • /
    • 2021
  • In this paper, we proposed a protocol to convert 2W telephone analog signals to Ethernet data in a private PSTN 2W tactical voice system. There are several kinds of operational problems in the tactical telephone network where 2W telephone copper lines are installed hundreds of meters away from the PBX in a headquarter site. The reason is that it is difficult to install and maintain the 2W telephone copper cable in severe operational fields and to meet safety and stability operational requirements of the telephone line under lighting and electromagnetic environments. In order to solve these challenging demands, we proposed an efficient method that the 2W analog interface signals between a private PBX system and a 2W telephone is converted to Ethernet messages using the optical Ethernet data communication network already deployed in the tactical weapon system. Thus, it is not necessary to install an additional optic cable for the ethernet telephone line and to maintain the private PSTN 2W telephone network. Also it provides safe and secure telecommunication operation under lightning and electromagnetic environments. This paper presents the conversion protocol from 2W telephone signals over Ethernet interface between PBX systems and 2W telephones, the mutual exchange protocol of ethernet messages between two converters, and the rule to process analog signal interface. Finally, we demonstrate that the proposed technique can provide a feasible solution in the tactical weapon system by analyzing its performance and experimental results such as the bandwidth of 2W telephone ethernet network and the transmission latency of voice signal, and the stability of optic ethernet voice network along with the ethernet data network.

항공용 인터콤의 백업 모드 운용을 위한 디지털 방식의 이중화 설계 (The Digital Redundancy Design for Back-up Mode Operation of Aviation Intercom)

  • 정성재;조경학;김동혁;이성우
    • 한국항행학회논문지
    • /
    • 제26권5호
    • /
    • pp.358-364
    • /
    • 2022
  • 항공용 인터콤 시스템은 정/부조종사 간 내부 통화 및 조종사와 승무원 간 내부 통화, 초고주파 무전기(U/VHF)와 같은 통신 장비를 통한 외부 통화, 초단파전방향거리탐지기/계기착륙장치(VOR/ILS), 전술 항법 장치(TACAN)와 같은 항법 및 임무 장비 오디오 신호 모니터링, 비행 데이터기록장치(FDR) 및 자료전송 시스템(DTS)으로의 음성 녹음용 오디오 신호 출력, 항공기의 상태와 위협 등에 대한 오디오 경고음/경고 음성 발생 등 항공기 내의 모든 음성 신호에 대한 처리를 담당하는 장비이다. 이러한 항공용 인터콤은 아날로그 오디오 신호의 경우 노이즈에 민감하기 때문에 조종사 및 승무원의 임무 수행을 위해 항공기 내/외부의 전자파 노이즈로부터 오디오 신호를 보호할 수 있는 이중화 설계가 필요하다. 본 논문에서는 항공용 디지털 인터콤의 이중화를 위한 정상/백업 운용모드 및 디지털 방식의 이중화 설계 방안과 제작 및 검증 결과에 대하여 기술한다.

단순화된 다중 모드 방법을 이용한 음성 부호화기 (A Speech Coder using the Simplified Multi-mode Method)

  • 강홍구
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1995년도 제12회 음성통신 및 신호처리 워크샵 논문집 (SCAS 12권 1호)
    • /
    • pp.146-149
    • /
    • 1995
  • This paper proposes a SM-CELP speech coder which applies different excitation signal according to the characteristic of speech segment at bit-rate below 4 kbps. Speech signal is divided with 2 modes such as stationary voice and etc. using the parameters of average energy of the short-time speech and the residual signal after long term prediction. Structured multi-pulse method is used for the excitation of mode-A and gaussian or pulse-like codebook for mode-B. 4.8kbps DoD-CELP are used to evaluate the performance of the proposed coder. As a result, the propose method shows 1~2 dB higher segmental signal to noise ratio and better subjectional quality without increasing the computational amount.

  • PDF

CDMA 이동통신시스템에서 멀티미디어 트래픽의 요구 신호 전력 특성 (Characteristics of the Required Signal Power for Multimedia Traffic in CDMA Systems)

  • 강창순
    • 한국통신학회논문지
    • /
    • 제27권6B호
    • /
    • pp.593-600
    • /
    • 2002
  • The reverse link signal power required for multimedia traffic in multipath faded single-code (SC-) and multi-code CDMA (MC-CDMA) systems is investigated. The effect of orthogonality loss among multiple spreading code channels is herein characterized by the orthogonality factor. The required signal power in both the CDMA systems is then analyzed in terms of the relative required signal power ratio of data to voice traffic. The effect of varying system parameters including spreading bandwidth, the of orthogonality factor, and the number of spreading codes are examined. Analytical results show that MC-CDMA users transmitting only a single traffic type require significantly more power than SC-CDMA users with only a single traffic type. On the other hand, MC-CDMA users transmitting multimedia traffic require power levels approximately identical to SC-CDMA users with multimedia traffic. The results can be used in the design of radio resource management (e.g., power allocation) scheme for wireless multimedia services.

카오스 패턴 발견을 위한 음성 데이터의 처리 기법 (Speech Signal Processing for Analysis of Chaos Pattern)

  • 김태식
    • 음성과학
    • /
    • 제8권3호
    • /
    • pp.149-157
    • /
    • 2001
  • Based on the chaos theory, a new method of presentation of speech signal has been presented in this paper. This new method can be used for pattern matching such as speaker recognition. The expressions of attractors are represented very well by the logistic maps that show the chaos phenomena. In the speaker recognition field, a speaker's vocal habit could be a very important matching parameter. The attractor configuration using change value of speech signal can be utilized to analyze the influence of voice undulations at a point on the vocal loudness scale to the next point. The attractors arranged by the method could be used in research fields of speech recognition because the attractors also contain unique information for each speaker.

  • PDF