Search | Korea Science

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

김민정;석수영;김광수;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.3 no.1
- /
- pp.8-14
- /
- 2002
In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.
PDF

The Study on the Factors for Detection of Renal Stone on Ultrasound (초음파 검사에서 신장 결석의 검출 요인에 관한 연구)

Sim, Hyun-Sun;Jung, Hong-Ryang;Lim, Cheong-Hwan
- Journal of radiological science and technology
- /
- v.29 no.1
- /
- pp.1-6
- /
- 2006
Purpose: Renal stones are common and typically arise within the collecting system. The renal sinus are contains the collection system, the renal vessels, lymphatcs, fat, and fibrous tissue. Because of the compression of all the large echoes in signal processing, the echo from the renal stone generally cannot be distinguished from large echoes emanating from normal structures of the renal sinus. Use of ultrasonography has been difficult for detecting small renal stone without posterior shadowing and chemical composition of stone. The aim of study was measuring for posterior acoustic shadowing to a stone for various scan parameter and it examines a help in renal stone diagnosis. Material & Methods: The stone was place on sponge examined in a water bath with a 3.5MHz or 7.5MHz transducer(LOGIQ 400, USA). First, tested a variety of gain. Second, tested a variety of dynamic range. Third, tested a variety of focal zone. Fourth, measuring of the echo level for low and high frequency for depth. Results: 1) Average echo level was 98 for low total gain(10 dB) and was 142 for high total gain(40 dB). Posterior acoustic shadowing of renal stone was clear for low gain. 2) Average echo level was 129 for low dynamic range(42 dB) and was 101 for high dynamic range(72 dB). Posterior acoustic shadowing of renal stone was clear for high dynamic range. 3) When stone is in focal zone of transducer, definite posterior acoustic shadow is identified. 4) Stone was clear appeared for high frequency(7.5 MHz) than low frequency(3.5 MHz) and it is not distorted. Conclusion: The demonstration of an posterior acoustic shadow of renal stone dependents on several technical factors such as gain, dynamic range, focus, and frequency. This various factors are a help in renal stone diagnosis.
PDF

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

Kim, Sang-hun; Park, Jun;Lee, Young-jik
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.24-33
- /
- 2001
this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.
PDF

An Implementation of Automatic Genre Classification System for Korean Traditional Music (한국 전통음악 (국악)에 대한 자동 장르 분류 시스템 구현)

Lee Kang-Kyu;Yoon Won-Jung;Park Kyu-Sik
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.1
- /
- pp.29-37
- /
- 2005
This paper proposes an automatic genre classification system for Korean traditional music. The Proposed system accepts and classifies queried input music as one of the six musical genres such as Royal Shrine Music, Classcal Chamber Music, Folk Song, Folk Music, Buddhist Music, Shamanist Music based on music contents. In general, content-based music genre classification consists of two stages - music feature vector extraction and Pattern classification. For feature extraction. the system extracts 58 dimensional feature vectors including spectral centroid, spectral rolloff and spectral flux based on STFT and also the coefficient domain features such as LPC, MFCC, and then these features are further optimized using SFS method. For Pattern or genre classification, k-NN, Gaussian, GMM and SVM algorithms are considered. In addition, the proposed system adopts MFC method to settle down the uncertainty problem of the system performance due to the different query Patterns (or portions). From the experimental results. we verify the successful genre classification performance over $97{\%}$ for both the k-NN and SVM classifier, however SVM classifier provides almost three times faster classification performance than the k-NN.
PDF KSCI

Surficial Sediment Classification using Backscattered Amplitude Imagery of Multibeam Echo Sounder(300 kHz) (다중빔 음향 탐사시스템(300 kHz)의 후방산란 자료를 이용한 해저면 퇴적상 분류에 관한 연구)

Park, Yo-Sup;Lee, Sin-Je;Seo, Won-Jin;Gong, Gee-Soo;Han, Hyuk-Soo;Park, Soo-Chul
- Economic and Environmental Geology
- /
- v.41 no.6
- /
- pp.747-761
- /
- 2008
In order to experiment the acoustic remote classification of seabed sediment, we achieved ground-truth data(i.e. video and grab samples, etc.) and developed post-processing for automatic classification procedure on the basis of 300 kHz MultiBeam Echo Sounder(MBES) backscattering data, which was acquired using KONGBERG Simrad EM3000 at Sock-Cho Port, East Sea of South Korea. Sonar signal and its classification performance were identified with geo-referenced video imagery with the aid of GIS (Geographic Information System). The depth range of research site was from 5 m to 22.7 m, and the backscattering amplitude showed from -36dB to -15dB. The mean grain sizes of sediment from equi-distanced sampling site(50 m interval) varied from 2.86$(\phi)$ to 0.88(\phi). To acquire the main feature for the seabed classification from backscattering amplitude of MBES, we evaluated the correlation factors between the backscattering amplitude and properties of sediment samples. The performance of seabed remote classification proposed was evaluated with comparing the correlation of human expert segmentation to automatic algorithm results. The cross-model perception error ratio on automatic classification algorithm shows 8.95% at rocky bottoms, and 2.06% at the area representing low mean grain size.
PDF KSCI

Analysis of statistical characteristics of bistatic reverberation in the east sea (동해 해역에서 양상태 잔향음 통계적 특징 분석)

Yeom, Su-Hyeon;Yoon, Seunghyun;Yang, Haesang;Seong, Woojae
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.4
- /
- pp.435-445
- /
- 2022
In this study, the reverberation of a bistatic sonar operated in southeastern coast in the East Sea in July 2020 was analyzed. The reverberation sensor data were collected through an LFM sound source towed by a research vessel and a horizontal line array receiver 1 km to 5 km away from it. The reverberation sensor data was analyzed by various methods including geo-plot after signal processing. Through this, it was confirmed that the angle reflected from the sound source through the scatterer to the receiver has a dominant influence on the distribution of the reverberation sound, and the probability distribution characteristics of bistatic sonar reverberation varies for each beam. In addition, parametric factors of K distribution and Rayleigh distribution were estimated from the sample through moment method estimation. Using the Kolmogorov-Smirnov test at the confidence level of 0.05, the distribution probability of the data was analyzed. As a result, it could be observed that the reverberation follows a Rayleigh probability distribution, and it could be estimated that this was the effect of a low reverberation to noise ratio.
https://doi.org/10.7776/ASK.2022.41.4.435 인용 PDF KSCI

Comprehensive analysis of deep learning-based target classifiers in small and imbalanced active sonar datasets (소량 및 불균형 능동소나 데이터세트에 대한 딥러닝 기반 표적식별기의 종합적인 분석)

Geunhwan Kim;Youngsang Hwang;Sungjin Shin;Juho Kim;Soobok Hwang;Youngmin Choo
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.4
- /
- pp.329-344
- /
- 2023
In this study, we comprehensively analyze the generalization performance of various deep learning-based active sonar target classifiers when applied to small and imbalanced active sonar datasets. To generate the active sonar datasets, we use data from two different oceanic experiments conducted at different times and ocean. Each sample in the active sonar datasets is a time-frequency domain image, which is extracted from audio signal of contact after the detection process. For the comprehensive analysis, we utilize 22 Convolutional Neural Networks (CNN) models. Two datasets are used as train/validation datasets and test datasets, alternatively. To calculate the variance in the output of the target classifiers, the train/validation/test datasets are repeated 10 times. Hyperparameters for training are optimized using Bayesian optimization. The results demonstrate that shallow CNN models show superior robustness and generalization performance compared to most of deep CNN models. The results from this paper can serve as a valuable reference for future research directions in deep learning-based active sonar target classification.
https://doi.org/10.7776/ASK.2023.42.4.329 인용 PDF

Implementation of Non-Stringed Guitar Based on Physical Modeling Synthesis (물리적 모델링 합성법에 기반을 둔 줄 없는 기타 구현)

Kang, Myeong-Su;Cho, Sang-Jin;Chong, Ui-Pil
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.2
- /
- pp.119-126
- /
- 2009
This paper describes the non-stringed guitar composed of laser strings, frets, sound synthesis algorithm and a processor. The laser strings that can depict stroke and playing arpeggios comprise laser modules and photo diodes. Frets are implemented by voltage divider. The guitar body does not need to implement physically because commuted waveguide synthesis is used. The proposed frets enable; players to represent all of chords by the chord glove as well as guitar solo. Sliding, hammering-on and pulling-off sounds are synthesized by using parameters from the voltage divider. Because the pitch shifting corresponds to the time-varying propagation speed in the digital waveguide model, the proposed model can synthesize vibrato as well. After transformation of signals from the laser strings and frets into parameters for synthesis algorithm, the digital signal processor, TMS320F2812, performs the real-time synthesis algorithm and communicates with the DAC. The demonstration movieclip available via the Internet shows one to play a song, 'Arirang', synthesized by proposed algorithm and interfaces in real-time. Consequently, we can conclude that the proposed synthesis algorithm is efficient in guitar solo and there is no problem to play the non-stringed guitar in real-time.
https://doi.org/10.7776/ASK.2009.28.2.119 인용 PDF KSCI

Low-Power Implementation of A Multichannel Hearing Aid Using A General-purpose DSP Chip (범용 DSP 칩을 이용한 다중 채널 보청기의 저전력 구현)

Kim, Bum-Jun;Byun, Joon;Park, Young-Cheol
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.11 no.1
- /
- pp.18-25
- /
- 2018
In this paper, we present a low-power implementation of the multi-channel hearing aid system using a general-purpose DSP chip. The system includes an acoustic amplification algorithm based on Wide Dynamic Range Compression (WDRC), an adaptive howling canceller, and a single-channel noise reduction algorithm. To achieve a low-power implementation, each algorithm is re-constructed in forms of integer program, and the integer program is converted to the assembly program using BelaSigna(R) 250 instructions. Through experiments using the implementation system, the performance of each processing algorithm was confirmed in real-time. Also, the clock of the implementation system was measured, and it was confirmed that the entire signal processing blocks can be performed in real time at about 7.02MHz system clock.
https://doi.org/10.17661/jkiiect.2018.11.1.18 인용 PDF KSCI

Effectiveness of the Multimedia Contentware (멀티미디어 콘텐트웨어의 효과연구-교감적 의미전달을 위한 정보설계를 중심으로-)

안상혁
- Archives of design research
- /
- v.11 no.3
- /
- pp.165-174
- /
- 1998
A development of the Digital Media through media convergency is creating various contents business. Basically, contentware can be explained as something watching, listening or enjoying. It has been developing to tie a moving image with a sound and a game; something that was in the entertainment business in the past. The good way to get an insight on effectiveness of the multimedia contentware would be though the understanding of the change of communication style in digital medium. Thus, this paper covers the effectiveness problems in the semantic level rather than the accuracy and efficiency in the syntactic level in the digital media mediated communication. As the trends of the media cornsumption is moving toward more an interesting way, it is essential to build up commoness in terms of sympathy between a contentware and user. So this paper introduces the concept of commom fields of experience through the fundamentals of the phatic communication as a way to architect the multimedia content effectively.
PDF

Search Result 564, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)