Search | Korea Science

Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation (잡음음성 음향모델 적응에 기반한 잡음에 강인한 음성인식)

Chung, Yongjoo
- Phonetics and Speech Sciences
- /
- v.6 no.2
- /
- pp.29-34
- /
- 2014
In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.
https://doi.org/10.13064/KSSS.2014.6.2.029 인용 PDF KSCI

Approach to diagnosing multiple abnormal events with single-event training data

Ji Hyeon Shin;Seung Gyu Cho;Seo Ryong Koo;Seung Jun Lee
- Nuclear Engineering and Technology
- /
- v.56 no.2
- /
- pp.558-567
- /
- 2024
Diagnostic support systems are being researched to assist operators in identifying and responding to abnormal events in a nuclear power plant. Most studies to date have considered single abnormal events only, for which it is relatively straightforward to obtain data to train the deep learning model of the diagnostic support system. However, cases in which multiple abnormal events occur must also be considered, for which obtaining training data becomes difficult due to the large number of combinations of possible abnormal events. This study proposes an approach to maintain diagnostic performance for multiple abnormal events by training a deep learning model with data on single abnormal events only. The proposed approach is applied to an existing algorithm that can perform feature selection and multi-label classification. We choose an extremely randomized trees classifier to select dedicated monitoring parameters for target abnormal events. In diagnosing each event occurrence independently, two-channel convolutional neural networks are employed as sub-models. The algorithm was tested in a case study with various scenarios, including single and multiple abnormal events. Results demonstrated that the proposed approach maintained diagnostic performance for 15 single abnormal events and significantly improved performance for 105 multiple abnormal events compared to the base model.
https://doi.org/10.1016/j.net.2023.10.033 인용 PDF

KNOWLEDGE-BASED BOUNDARY EXTRACTION OF MULTI-CLASSES OBJECTS

Park, Hae-Chul;Shin, Ho-Chul;Lee, Jin-Sung;Cho, Ju-Hyun;Kim, Seong-Dae
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.1968-1971
- /
- 2003
We propose a knowledge-based algorithm for extracting an object boundary from low-quality image like the forward looking infrared image. With the multi-classes training data set, the global shape is modeled by multispace KL(MKL)[1] and curvature model. And the objective function for fitting the deformable boundary template represented by the shape model to true boundary in an input image is formulated by Bales rule. Simulation results show that our method has more accurateness in case of multi-classes training set and performs better in the sense of computation cost than point distribution model(PDM)[2]. It works well in distortion under the noise, pose variation and some kinds of occlusions.
PDF

Training-Free Fuzzy Logic Based Human Activity Recognition

Kim, Eunju;Helal, Sumi
- Journal of Information Processing Systems
- /
- v.10 no.3
- /
- pp.335-354
- /
- 2014
The accuracy of training-based activity recognition depends on the training procedure and the extent to which the training dataset comprehensively represents the activity and its varieties. Additionally, training incurs substantial cost and effort in the process of collecting training data. To address these limitations, we have developed a training-free activity recognition approach based on a fuzzy logic algorithm that utilizes a generic activity model and an associated activity semantic knowledge. The approach is validated through experimentation with real activity datasets. Results show that the fuzzy logic based algorithms exhibit comparable or better accuracy than other training-based approaches.
https://doi.org/10.3745/JIPS.04.0005 인용 PDF KSCI

A study on the speech recognition by HMM based on multi-observation sequence (다중 관측열을 토대로한 HMM에 의한 음성 인식에 관한 연구)

정의봉
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.34S no.4
- /
- pp.57-65
- /
- 1997
The purpose of this paper is to propose the HMM (hidden markov model) based on multi-observation sequence for the isolated word recognition. The proosed model generates the codebook of MSVQ by dividing each word into several sections followed by dividing training data into several sections. Then, we are to obtain the sequential value of multi-observation per each section by weighting the vectors of distance form lower values to higher ones. Thereafter, this the sequential with high probability value while in recognition. 146 DDD area names are selected as the vocabularies for the target recognition, and 10LPC cepstrum coefficients are used as the feature parameters. Besides the speech recognition experiments by way of the proposed model, for the comparison with it, the experiments by DP, MSVQ, and genral HMM are made with the same data under the same condition. The experiment results have shown that HMM based on multi-observation sequence proposed in this paper is proved superior to any other methods such as the ones using DP, MSVQ and general HMM models in recognition rate and time.
PDF

Feature Extraction on a Periocular Region and Person Authentication Using a ResNet Model (ResNet 모델을 이용한 눈 주변 영역의 특징 추출 및 개인 인증)

Kim, Min-Ki
- Journal of Korea Multimedia Society
- /
- v.22 no.12
- /
- pp.1347-1355
- /
- 2019
Deep learning approach based on convolution neural network (CNN) has extensively studied in the field of computer vision. However, periocular feature extraction using CNN was not well studied because it is practically impossible to collect large volume of biometric data. This study uses the ResNet model which was trained with the ImageNet dataset. To overcome the problem of insufficient training data, we focused on the training of multi-layer perception (MLP) having simple structure rather than training the CNN having complex structure. It first extracts features using the pretrained ResNet model and reduces the feature dimension by principle component analysis (PCA), then trains a MLP classifier. Experimental results with the public periocular dataset UBIPr show that the proposed method is effective in person authentication using periocular region. Especially it has the advantage which can be directly applied for other biometric traits.
https://doi.org/10.9717/kmms.2019.22.12.1347 인용 PDF KSCI

Interpolation based Single-path Sub-pixel Convolution for Super-Resolution Multi-Scale Networks

Alao, Honnang;Kim, Jin-Sung;Kim, Tae Sung;Oh, Juhyen;Lee, Kyujoong
- Journal of Multimedia Information System
- /
- v.8 no.4
- /
- pp.203-210
- /
- 2021
Deep leaning convolutional neural networks (CNN) have successfully been applied to image super-resolution (SR). Despite their great performances, SR techniques tend to focus on a certain upscale factor when training a particular model. Algorithms for single model multi-scale networks can easily be constructed if images are upscaled prior to input, but sub-pixel convolution upsampling works differently for each scale factor. Recent SR methods employ multi-scale and multi-path learning as a solution. However, this causes unshared parameters and unbalanced parameter distribution across various scale factors. We present a multi-scale single-path upsample module as a solution by exploiting the advantages of sub-pixel convolution and interpolation algorithms. The proposed model employs sub-pixel convolution for the highest scale factor among the learning upscale factors, and then utilize 1-dimension interpolation, compressing the learned features on the channel axis to match the desired output image size. Experiments are performed for the single-path upsample module, and compared to the multi-path upsample module. Based on the experimental results, the proposed algorithm reduces the upsample module's parameters by 24% and presents slightly to better performance compared to the previous algorithm.
https://doi.org/10.33851/JMIS.2021.8.4.203 인용 PDF KSCI HTML

No-Reference Image Quality Assessment based on Quality Awareness Feature and Multi-task Training

Lai, Lijing;Chu, Jun;Leng, Lu
- Journal of Multimedia Information System
- /
- v.9 no.2
- /
- pp.75-86
- /
- 2022
The existing image quality assessment (IQA) datasets have a small number of samples. Some methods based on transfer learning or data augmentation cannot make good use of image quality-related features. A No Reference (NR)-IQA method based on multi-task training and quality awareness is proposed. First, single or multiple distortion types and levels are imposed on the original image, and different strategies are used to augment different types of distortion datasets. With the idea of weak supervision, we use the Full Reference (FR)-IQA methods to obtain the pseudo-score label of the generated image. Then, we combine the classification information of the distortion type, level, and the information of the image quality score. The ResNet50 network is trained in the pre-train stage on the augmented dataset to obtain more quality-aware pre-training weights. Finally, the fine-tuning stage training is performed on the target IQA dataset using the quality-aware weights to predicate the final prediction score. Various experiments designed on the synthetic distortions and authentic distortions datasets (LIVE, CSIQ, TID2013, LIVEC, KonIQ-10K) prove that the proposed method can utilize the image quality-related features better than the method using only single-task training. The extracted quality-aware features improve the accuracy of the model.
https://doi.org/10.33851/JMIS.2022.9.2.75 인용 PDF KSCI HTML

Slime mold and four other nature-inspired optimization algorithms in analyzing the concrete compressive strength

Yinghao Zhao;Hossein Moayedi;Loke Kok Foong;Quynh T. Thi
- Smart Structures and Systems
- /
- v.33 no.1
- /
- pp.65-91
- /
- 2024
The use of five optimization techniques for the prediction of a strength-based concrete mixture's best-fit model is examined in this work. Five optimization techniques are utilized for this purpose: Slime Mold Algorithm (SMA), Black Hole Algorithm (BHA), Multi-Verse Optimizer (MVO), Vortex Search (VS), and Whale Optimization Algorithm (WOA). MATLAB employs a hybrid learning strategy to train an artificial neural network that combines least square estimation with backpropagation. Thus, 72 samples are utilized as training datasets and 31 as testing datasets, totaling 103. The multi-layer perceptron (MLP) is used to analyze all data, and results are verified by comparison. For training datasets in the best-fit models of SMA-MLP, BHA-MLP, MVO-MLP, VS-MLP, and WOA-MLP, the statistical indices of coefficient of determination (R²) in training phase are 0.9603, 0.9679, 0.9827, 0.9841 and 0.9770, and in testing phase are 0.9567, 0.9552, 0.9594, 0.9888 and 0.9695 respectively. In addition, the best-fit structures for training for SMA, BHA, MVO, VS, and WOA (all combined with multilayer perceptron, MLP) are achieved when the term population size was modified to 450, 500, 250, 150, and 500, respectively. Among all the suggested options, VS could offer a stronger prediction network for training MLP.
https://doi.org/10.12989/sss.2024.33.1.065 인용

Num Worker Tuner: An Automated Spawn Parameter Tuner for Multi-Processing DataLoaders

Synn, DoangJoo;Kim, JongKook
- Annual Conference of KIPS
- /
- 2021.11a
- /
- pp.446-448
- /
- 2021
In training a deep learning model, it is crucial to tune various hyperparameters and gain speed and accuracy. While hyperparameters that mathematically induce convergence impact training speed, system parameters that affect host-to-device transfer are also crucial. Therefore, it is important to properly tune and select parameters that influence the data loader as a system parameter in overall time acceleration. We propose an automated framework called Num Worker Tuner (NWT) to address this problem. This method finds the appropriate number of multi-processing subprocesses through the search space and accelerates the learning through the number of subprocesses. Furthermore, this method allows memory efficiency and speed-up by tuning the system-dependent parameter, the number of multi-process spawns.
https://doi.org/10.3745/PKIPS.y2021m11a.446 인용 PDF

Search Result 352, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)