Search | Korea Science

A User-friendly Remote Speech Input Method in Spontaneous Speech Recognition System

Suh, Young-Joo;Park, Jun;Lee, Young-Jik
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.2E
- /
- pp.38-46
- /
- 1998
In this paper, we propose a remote speech input device, a new method of user-friendly speech input in spontaneous speech recognition system. We focus the user friendliness on hands-free and microphone independence in speech recognition applications. Our method adopts two algorithms, the automatic speech detection and the microphone array delay-and-sum beamforming (DSBF)-based speech enhancement. The automatic speech detection algorithm is composed of two stages; the detection of speech and nonspeech using the pitch information for the detected speech portion candidate. The DSBF algorithm adopts the time domain cross-correlation method as its time delay estimation. In the performance evaluation, the speech detection algorithm shows within-200 ms start point accuracy of 93%, 99% under 15dB, 20dB, and 25dB signal-to-noise ratio (SNR) environments, respectively and those for the end point are 72%, 89%, and 93% for the corresponding environments, respectively. The classification of speech and nonspeech for the start point detected region of input signal is performed by the pitch information-base method. The percentages of correct classification for speech and nonspeech input are 99% and 90%, respectively. The eight microphone array-based speech enhancement using the DSBF algorithm shows the maximum SNR gaing of 6dB over a single microphone and the error reductin of more than 15% in the spontaneous speech recognition domain.
PDF

Automatic Real-time Identification of Fingerprint Images Using Block-FFT (블럭 FFT를 이용한 실시간 지문 인식 알고리즘)

안도성;김학일
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.6
- /
- pp.909-921
- /
- 1995
The objective of this paper is to develop an algorithm for a real-time automatic fingerprint recognition system. The algorithm employs the Fast Fourier Transform (FFT) in determining the directions of ridges in fingerprint images, and utilizes statistical information in recognizing the fingerprints. The information used in fingerprint recognition is based on the dircetions along ridge curves and characteristic points such as core points and delta points. In order to find ridge directions, the algorithm applies the FFT to a small block of the size 8x8 pixels, and decides the directions by interpreting the resulted Fourier spectrum. By using the FFT, the algorithm does not require conventional preprocessing procedures such as smoothing, binarization, thinning, and restorationl. Finally, in matching two fingerprint images, the algorithm searches and compares two kinds of feature blocks, one as the blocks where the dircetions cannot be defined from the Fourier spectrum, and the other as the blocks where the changes of directions become abrupt. The proposed algorithm has been implemented on a SunSparc-2 workstation under the Open Window environment. In the experiment, the proposed algorithm has been applied to a set of fingerprint images obtained by a prism system. The result has shown that while the rate of Type II error - Incorrect recognition of two different fingerprints as the identical fingerprints - is held at 0.0%, the rate of Type I error - Incorrect recognition of two identical fingerprints as the different ones - is 2.2%.
PDF

Fast offline transformer-based end-to-end automatic speech recognition for real-world applications

Oh, Yoo Rhee;Park, Kiyoung;Park, Jeon Gue
- ETRI Journal
- /
- v.44 no.3
- /
- pp.476-490
- /
- 2022
With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.
https://doi.org/10.4218/etrij.2021-0106 인용 PDF KSCI

Real-time multiple face recognition system based on one-shot panoramic scanning (원샷 파노라믹 스캐닝 기반 실시간 다수 얼굴 인식 시스템)

Kim, Daehwan
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.10a
- /
- pp.553-555
- /
- 2022
This paper is about a real-time automatic face recognition system based on one-shot panoramic scanning. It detects multiple faces in real time through a single panoramic scanning process and recognizes pre-registered faces. Instead of recognizing multiple faces within a single panoramic image, multiple faces are recognized using multiple images obtained in the scanning process. This reduces the panorama image creation time and stitching error, and at the same time can improve the face recognition performance by using the accumulated information of multiple images. It is expected that it can be used in various applications such as a multi-person smart attendance system with only a simple image acquisition device.
PDF

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

Jungwon Chang;Hosung Nam
- Phonetics and Speech Sciences
- /
- v.15 no.3
- /
- pp.83-88
- /
- 2023
This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.
https://doi.org/10.13064/KSSS.2023.15.3.083 인용 PDF

Multi-resolution DenseNet based acoustic models for reverberant speech recognition (잔향 환경 음성인식을 위한 다중 해상도 DenseNet 기반 음향 모델)

Park, Sunchan;Jeong, Yongwon;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.10 no.1
- /
- pp.33-38
- /
- 2018
Although deep neural network-based acoustic models have greatly improved the performance of automatic speech recognition (ASR), reverberation still degrades the performance of distant speech recognition in indoor environments. In this paper, we adopt the DenseNet, which has shown great performance results in image classification tasks, to improve the performance of reverberant speech recognition. The DenseNet enables the deep convolutional neural network (CNN) to be effectively trained by concatenating feature maps in each convolutional layer. In addition, we extend the concept of multi-resolution CNN to multi-resolution DenseNet for robust speech recognition in reverberant environments. We evaluate the performance of reverberant speech recognition on the single-channel ASR task in reverberant voice enhancement and recognition benchmark (REVERB) challenge 2014. According to the experimental results, the DenseNet-based acoustic models show better performance than do the conventional CNN-based ones, and the multi-resolution DenseNet provides additional performance improvement.
https://doi.org/10.13064/KSSS.2018.10.1.033 인용 PDF KSCI

Automatic Target Recognition by selecting similarity-transform-invariant local and global features (유사변환에 불변인 국부적 특징과 광역적 특징 선택에 의한 자동 표적인식)

Sun, Sun-Gu;Park, Hyun-Wook
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.39 no.4
- /
- pp.370-380
- /
- 2002
This paper proposes an ATR (Automatic Target Recognition) algorithm for identifying non-occluded and occluded military vehicles in natural FLIR (Forward Looking InfraRed) images. After segmenting a target, a radial function is defined from the target boundary to extract global shape features. Also, to extract local shape features of upper region of a target, a distance function is defined from boundary points and a line between two extreme points. From two functions and target contour, four global and four local shape features are proposed. They are much more invariant to translation, rotation and scale transform than traditional feature sets. In the experiments, we show that the proposed feature set is superior to the traditional feature sets with respect to the similarity-transform invariance and recognition performance.
PDF KSCI

Automatic Recognition of Direction Information in Road Sign Image Using OpenCV (OpenCV를 이용한 도로표지 영상에서의 방향정보 자동인식)

Kim, Gihong;Chong, Kyusoo;Youn, Junhee
- Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
- /
- v.31 no.4
- /
- pp.293-300
- /
- 2013
Road signs are important infrastructures for safe and smooth traffic by providing useful information to drivers. It is necessary to establish road sign DB for managing road signs systematically. To provide such DB, manually detection and recognition from imagery can be done. However, it is time and cost consuming. In this study, we proposed algorithms for automatic recognition of direction information in road sign image. Also we developed algorithm code using OpenCV library, and applied it to road sign image. To automatically detect and recognize direction information, we developed program which is composed of various modules such as image enhancement, image binarization, arrow region extraction, interesting point extraction, and template image matching. As a result, we can confirm the possibility of automatic recognition of direction information in road sign image.
https://doi.org/10.7848/ksgpc.2013.31.4.293 인용 PDF KSCI

Fault Diagnosis Method for Automatic Machine Using Artificial Neutral Network Based on DWT Power Spectral Density (인공신경망을 이용한 DWT 전력스펙트럼 밀도 기반 자동화 기계 고장 진단 기법)

Kang, Kyung-Won
- Journal of the Institute of Convergence Signal Processing
- /
- v.20 no.2
- /
- pp.78-83
- /
- 2019
Sounds based machine fault diagnosis recovers all the studies that aim to detect automatically abnormal sound on machines using the acoustic emission by these machines. Conventional methods that use mathematical models have been found inaccurate because of the complexity of the industry machinery systems and the obvious existence of nonlinear factors such as noises. Therefore, any fault diagnosis issue can be treated as a pattern recognition problem. We propose here an automatic fault diagnosis method of hand drills using discrete wavelet transform(DWT) and pattern recognition techniques such as artificial neural networks(ANN). We first conduct a filtering analysis based on DWT. The power spectral density(PSD) is performed on the wavelet subband except for the highest and lowest low frequency subband. The PSD of the wavelet coefficients are extracted as our features for classifier based on ANN the pattern recognition part. The results show that the proposed method can be effectively used not only to detect defects but also to various automatic diagnosis system based on sound.
PDF KSCI

Automatic Recognition of Analog and Digital Modulation Signals (아날로그 및 디지털 변조 신호의 자동 인식)

Seo Seunghan;Yoon Yeojong;Jin Younghwan;Seo Yongju;Lim Sunmin;Ahn Jaemin;Eun Chang-Soo;Jang Won;Nah Sunphil
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.1C
- /
- pp.73-81
- /
- 2005
We propose an automatic modulation recognition scheme which extracts pre-defined key features from the received signal and then applies equal gain combining method to determine the used modulation. Moreover, we compare and analyze the performance of the proposed algorithm with that of decision-theoretic algorithm. Our scheme extracts five pre-defined key features from each data segment, a data unit for the key feature extraction, which are then averaged over all the segments to recognize the modulation according to the decision procedure. We check the performance of the proposed algorithm through computer simulations for analog modulations such as AM, FM, SSB and for digital modulations such as FSK2, FSK4, PSK2, and PSK4, by measuring recognition success rate varying SNR and data collection time. The result shows that the performance of the proposed scheme is comparable to that of the decision-theoretic algorithm with less complexity.
PDF KSCI

Search Result 1,072, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)