• Title/Summary/Keyword: Recognition Enhancement

Search Result 362, Processing Time 0.024 seconds

A study on combination of loss functions for effective mask-based speech enhancement in noisy environments (잡음 환경에 효과적인 마스크 기반 음성 향상을 위한 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.234-240
    • /
    • 2021
  • In this paper, the mask-based speech enhancement is improved for effective speech recognition in noise environments. In the mask-based speech enhancement, enhanced spectrum is obtained by multiplying the noisy speech spectrum by the mask. The VoiceFilter (VF) model is used as the mask estimation, and the Spectrogram Inpainting (SI) technique is used to remove residual noise of enhanced spectrum. In this paper, we propose a combined loss to further improve speech enhancement. In order to effectively remove the residual noise in the speech, the positive part of the Triplet loss is used with the component loss. For the experiment TIMIT database is re-constructed using NOISEX92 noise and background music samples with various Signal to Noise Ratio (SNR) conditions. Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) are used as the metrics of performance evaluation. When the VF was trained with the mean squared error and the SI model was trained with the combined loss, SDR, PESQ, and STOI were improved by 0.5, 0.06, and 0.002 respectively compared to the system trained only with the mean squared error.

The Moderating Effects of Age and Gender on the Relationship between Values and Communication styles of Korean Adults (한국 성인의 가치와 의사소통 방식 간의 관계에서 연령과 성별의 조절효과)

  • Eunjung Son
    • Korean Journal of Culture and Social Issue
    • /
    • v.29 no.2
    • /
    • pp.199-221
    • /
    • 2023
  • This study examined the moderating effects of age and gender on the relationship between values and communication styles of Korean adults. Five hundred adult men and women across the country responded the questionnaires regarding cultural universal values (openness to change, self-enhancement, conservatism, and self-transcendence), cultural-specific values (collectivism, conformity to norms, emotional self-control, family recognition through achievement, and humility), high-context communication style, and low-context communication style. The results of this study are as follows. First, as a result of exploring the factors influencing the communication style, self-enhancement, emotional self-control, and self-transcendence significantly predicted the high-context communication style. Whereas openness to change, self-enhancement, conformity to norms, emotional self-control, and gender significantly predicted the low-context communication style. Second, age moderated the relationship between self-enhancement and high-context communication style. The high-context communication style significantly increased when the level of self-enhancement was high and the age was younger. Third, age and gender moderated the relationship between conformity to norms and high-context communication style. In the case of males with high conformity to norms and younger age, the high-context communication style significantly increased. Fourth, gender moderated the relationship between collectivism and low-context communication. As collectivism increased, men tended to increase low-context communication styles, while women tended to decrease it. Fifth, gender moderated the relationship between humility and low-context communication. In the case of women with high humility, their low-context communication style was significantly lowered. The implications and limitations of the results of this study were discussed.

An Improvement of Stochastic Feature Extraction for Robust Speech Recognition (강인한 음성인식을 위한 통계적 특징벡터 추출방법의 개선)

  • 김회린;고진석
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.180-186
    • /
    • 2004
  • The presence of noise in speech signals degrades the performance of recognition systems in which there are mismatches between the training and test environments. To make a speech recognizer robust, it is necessary to compensate these mismatches. In this paper, we studied about an improvement of stochastic feature extraction based on band-SNR for robust speech recognition. At first, we proposed a modified version of the multi-band spectral subtraction (MSS) method which adjusts the subtraction level of noise spectrum according to band-SNR. In the proposed method referred as M-MSS, a noise normalization factor was newly introduced to finely control the over-estimation factor depending on the band-SNR. Also, we modified the architecture of the stochastic feature extraction (SFE) method. We could get a better performance when the spectral subtraction was applied in the power spectrum domain than in the mel-scale domain. This method is denoted as M-SFE. Last, we applied the M-MSS method to the modified stochastic feature extraction structure, which is denoted as the MMSS-MSFE method. The proposed methods were evaluated on isolated word recognition under various noise environments. The average error rates of the M-MSS, M-SFE, and MMSS-MSFE methods over the ordinary spectral subtraction (SS) method were reduced by 18.6%, 15.1%, and 33.9%, respectively. From these results, we can conclude that the proposed methods provide good candidates for robust feature extraction in the noisy speech recognition.

Recognition for Nursing Competency Importance, Nursing Competency Level, and Their Influencing Factors of Nurses in the Long-term Care Hospitals (요양병원 간호사의 간호역량 중요성 인식과 간호역량수준 및 영향요인 분석)

  • Kim, Eun-Jae;Gu, Mee-Ock
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.3
    • /
    • pp.1989-2001
    • /
    • 2015
  • This study was conducted to identify the recognition for nursing competency importance, nursing competency level, and their influencing factors of nurses in the long-term care hospitals. Participants were 243 nurses who were working in the 11 long-term care hospitals. Data were collected from August 25 to September 3, 2014. Data were analyzed using descriptive statistics, t-test, ANOVA, Pearson's correlation and multiple regression by SPSS 19.0. Mean scores of the recognition for nursing competency importance and nursing competency level were $4.21{\pm}0.48$ and $3.47{\pm}0.46$ respectively. Nursing competency level was significantly lower than the recognition for nursing competency importance. The variable influencing the recognition for nursing competency importance was the position(${\beta}=.19$). The variables influencing the nursing competency level were the recognition of nursing competency importance (${\beta}=.37$), age (${\beta}=.20$), current work experience (${\beta}=.13$), health status (${\beta}=.13$). The results suggest the need of developing measurement tool and nursing competency enhancement programs which can well reflect the characteristics of nursing competency required in the long-term care hospital.

A Semi-Noniterative VQ Design Algorithm for Text Dependent Speaker Recognition (문맥종속 화자인식을 위한 준비반복 벡터 양자기 설계 알고리즘)

  • Lim, Dong-Chul;Lee, Haing-Sei
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.67-72
    • /
    • 2003
  • In this paper, we study the enhancement of VQ (Vector Quantization) design for text dependent speaker recognition. In a concrete way, we present the non-Iterative method which makes a vector quantization codebook and this method Is nut Iterative learning so that the computational complexity is epochally reduced. The proposed semi-noniterative VQ design method contrasts with the existing design method which uses the iterative learning algorithm for every training speaker. The characteristics of a semi-noniterative VQ design is as follows. First, the proposed method performs the iterative learning only for the reference speaker, but the existing method performs the iterative learning for every speaker. Second, the quantization region of the non-reference speaker is equivalent for a quantization region of the reference speaker. And the quantization point of the non-reference speaker is the optimal point for the statistical distribution of the non-reference speaker In the numerical experiment, we use the 12th met-cepstrum feature vectors of 20 speakers and compare it with the existing method, changing the codebook size from 2 to 32. The recognition rate of the proposed method is 100% for suitable codebook size and adequate training data. It is equal to the recognition rate of the existing method. Therefore the proposed semi-noniterative VQ design method is, reducing computational complexity and maintaining the recognition rate, new alternative proposal.

Improvement of Face Recognition Algorithm for Residential Area Surveillance System Based on Graph Convolution Network (그래프 컨벌루션 네트워크 기반 주거지역 감시시스템의 얼굴인식 알고리즘 개선)

  • Tan Heyi;Byung-Won Min
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.2
    • /
    • pp.1-15
    • /
    • 2024
  • The construction of smart communities is a new method and important measure to ensure the security of residential areas. In order to solve the problem of low accuracy in face recognition caused by distorting facial features due to monitoring camera angles and other external factors, this paper proposes the following optimization strategies in designing a face recognition network: firstly, a global graph convolution module is designed to encode facial features as graph nodes, and a multi-scale feature enhancement residual module is designed to extract facial keypoint features in conjunction with the global graph convolution module. Secondly, after obtaining facial keypoints, they are constructed as a directed graph structure, and graph attention mechanisms are used to enhance the representation power of graph features. Finally, tensor computations are performed on the graph features of two faces, and the aggregated features are extracted and discriminated by a fully connected layer to determine whether the individuals' identities are the same. Through various experimental tests, the network designed in this paper achieves an AUC index of 85.65% for facial keypoint localization on the 300W public dataset and 88.92% on a self-built dataset. In terms of face recognition accuracy, the proposed network achieves an accuracy of 83.41% on the IBUG public dataset and 96.74% on a self-built dataset. Experimental results demonstrate that the network designed in this paper exhibits high detection and recognition accuracy for faces in surveillance videos.

Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory (통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.417-423
    • /
    • 2017
  • Recently, many embedded devices that have the computing capability required for deep learning have become available; hence, many new applications using these devices are emerging. However, these embedded devices have an architecture different from that of PCs and high-performance servers. In this paper, we propose a method that improves the performance of deep-learning framework by considering the architecture of an embedded device that shares memory between the CPU and the GPU. The proposed method is implemented in Caffe, an open-source deep-learning framework, and is evaluated on an NVIDIA Jetson TK1 embedded device. In the experiment, we investigate the image recognition performance of several state-of-the-art deep-learning networks, including AlexNet, VGGNet, and GoogLeNet. Our results show that the proposed method can achieve significant performance gain. For instance, in AlexNet, we could reduce image recognition latency by about 33% and energy consumption by about 50%.

Preprocessing Technique for Improvement of Speech Recognition in a Car (차량에서의 음성인식율 향상을 위한 전처리 기법)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.1
    • /
    • pp.139-146
    • /
    • 2009
  • This paper addresses a modified spectral subtraction schemes which is suitable to speech recognition under low signal-to-noise ratio (SNR) noisy environment such as the automatic speech recognition (ASR) system in car. The conventional spectral subtraction schemes rely on the SNR such that attenuation is imposed on that part of the spectrum that appears to have low SNR, and accentuation is made on that part of high SNR. However, such postulation is adequate for high SNR environment, it is grossly inadequate for low SNR scenarios such as that of car environment. Proposed methods focused specifically to low SNR noisy environment by using weighting function for enhancing speech dominant region in speech spectrum. Experimental results by using voice commands for car show the superior performance of the proposed method over conventional methods.

Estimating Three-Dimensional Scattering Centers of a Target Using the 3D MEMP Method in Radar Target Recognition (레이다 표적 인식에서 3D MEMP 기법을 이용한 표적의 3차원 산란점 예측)

  • Shin, Seung-Yong;Myung, Noh-Hoon
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.19 no.2
    • /
    • pp.130-137
    • /
    • 2008
  • This paper presents high resolution techniques of three-dimensional(3D) scattering center extraction for a radar backscattered signal in radar target recognition. We propose a 3D pairing procedure, a new approach to estimate 3D scattering centers. This pairing procedure is more accurate and robust than the general criterion. 3D MEMP(Matrix Enhancement and Matrix Pencil) with the 3D pairing procedure first creates an autocorrelation matrix from radar backscattered field data samples. A matrix pencil method is then used to extract 3D scattering centers from the principal eigenvectors of the autocorrelation matrix. An autocorrelation matrix is constructed by the MSSP(modified spatial smoothing preprocessing) method. The observation matrix required for estimation of 3D scattering center locations is built using the sparse scanning order conception. In order to demonstrate the performance of the proposed technique, we use backscattered field data generated by ideal point scatterers.

Application of Multi-Frame Based Super-Resolution Algorithm for a Color Recognition Enhancement for the UAV (복수영상기반 초해상도 색상인식능력향상 알고리즘의 무인기 적용)

  • Park, Jihoon;Kim, Jeongho;Lee, Daewoo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.45 no.3
    • /
    • pp.180-190
    • /
    • 2017
  • This paper describes the application of Multi-frame based super-resolution method to enhance resolution of image information from the UAV, and the improvement of UAV's ground target recognition ability. To verify this algorithm, we designed a flight/ground control system, and the UAV, and then the algorithm was validated using the UAV system with ground target. As a result of the comparison between the pre-applied image and post-applied one shows that the RMSE is from 0.0677 to 0.0315, NRMSE is from 7.4030% to 3.5726%, PSNR is from 23.3885dB to 30.0036dB, and SSIM is from 0.6996 to 0.8948. Through these results, we validate this study can enhance the resolution of UAV's image using Multi-frame based super-resolution algorithm.