• Title/Summary/Keyword: 음성인식률

Search Result 549, Processing Time 0.03 seconds

A study on the Smart Door System For Single Households (1인 가구를 위한 스마트 도어 시스템에 대한 연구)

  • Kim, Donghyeon;Park, Yeeun;Moon, Juhyuk;Im, Yunkyung;Ko, Dongbeom;Kim, Jungjoon;Park, Jeongmin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.5
    • /
    • pp.267-274
    • /
    • 2018
  • This paper introduces a smart door system composed of security system and secretary system. As ratio of single households increase, the security of household became more important. Also already there were a lot of artificial intelligence secretary system based on voice called smart home technology. But It has limits. It can not work without user's requests. That mean it is not automatic. And the voice recognition depend on user's pronounce. Thus in this paper, we design and develop smart door system that is added function of security and secretary. That can inform users that there are outsider in front of their house in real time. Also that can speak information such as user's requirements, delivery and weather information using TTS. As a result they can prevent crimes and use convenient secretary system.

A Study of Automatic Detection of Music Signal from Broadcasting Audio Signal (방송 오디오 신호로부터 음악 신호 검출에 관한 연구)

  • Yoon, Won-Jung;Park, Kyu-Sik
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.81-88
    • /
    • 2010
  • In this paper, we proposed an automatic music/non-music signal discrimination system from broadcasting audio signal as a preliminary study of building a sound source monitoring system in real broadcasting environment. By reflecting human speech articulation characteristics, we used three simple time-domain features such as energy standard deviation, log energy standard deviation and log energy mean. Based on the experimental threshold values of each feature, we developed a rule-based algorithm to classify music portion of the input audio signal. For the verification of the proposed algorithm, actual FM broadcasting signal was recorded for 24 hours and used as source input audio signal. From the experimental results, the proposed system can effectively recognize music section with the accuracy of 96% and non-music section with that of 87%, where the performance is good enough to be used as a pre-process module for the a sound source monitoring system.

A Study on Keyword Spotting System Using Pseudo N-gram Language Model (의사 N-gram 언어모델을 이용한 핵심어 검출 시스템에 관한 연구)

  • 이여송;김주곤;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.242-247
    • /
    • 2004
  • Conventional keyword spotting systems use the connected word recognition network consisted by keyword models and filler models in keyword spotting. This is why the system can not construct the language models of word appearance effectively for detecting keywords in large vocabulary continuous speech recognition system with large text data. In this paper to solve this problem, we propose a keyword spotting system using pseudo N-gram language model for detecting key-words and investigate the performance of the system upon the changes of the frequencies of appearances of both keywords and filler models. As the results, when the Unigram probability of keywords and filler models were set to 0.2, 0.8, the experimental results showed that CA (Correctly Accept for In-Vocabulary) and CR (Correctly Reject for Out-Of-Vocabulary) were 91.1% and 91.7% respectively, which means that our proposed system can get 14% of improved average CA-CR performance than conventional methods in ERR (Error Reduction Rate).

A TCP-like flow control algorithm for RTP/RTCP (TCP 와 RTP/RTCP 유사한 흐름제어 알고리즘)

  • 나승구;윤성덕;안종석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.480-482
    • /
    • 1998
  • 최근, 멀티캐스트 기법을 사용하는 멀티미디어 응용 프로그램들이 인터넷에 등장하고 있다. 이들 응용 프로그램들의 성공 여부는 수신자들에게 전송되는 음성/영상의 품질에 의해 좌우된다. 인터넷은 응용프로그램의 QoS(Quality of Service) 에 대한 요구를 보장할 수 없기 때문에 멀티케스트 트래픽(multicast traffic)을 위하여 인터넷의 성능을 최대한 효율적으로 이용할 수 있도록 흐름제어에 대한 많은 연구가 진행되고 있다. 그 중 IVS(INRIA Video conferencing System)에서 제안한 멀티캐스트 트래픽 흐름제어 알고리즘은 수신자가 주기적으로 전달하는 RTCP 의 패킷손실 정보에 의해 송신자가 전송율을 조절하는 것이다. 그러나 이 알고리즘은 네트워크 상태가 무부하(unload)임에도 불구하고 느린 피드백으로 인하여 가용 네트워크 대역폭을 빠르게 파악하지 못하기 때문에, TCP트래픽과 경쟁 상태에서 네트워크 대역폭을 불공정(unfairness)하게 사용하게 되고 네트워크 상태에 알맞는 전송율을 결정하지 못한다. 본 논문에서는 더욱 공정하게 대역폭을 공유할 수 있고 전체 링크 이용율을 높이는 두 가지 기법을 제안한다. 첫째, 측정된 네트워크 혼잡상태에 따라 RTCP 피드백의 전송 빈도를 동적으로 조절하는 것이다. 둘째, TCP와 같이 전송율을 증가/감소시킴으로써 공정하게 네트워크를 공유하도록 하는 것이다. 본 논문에서는 이 두 가지 기법들이 TCP 트래픽에 영향을 주지 않고 또한 RTCP피드백의 양을 증가시키지 않으면서도 공정하게 네트워크 대역폭을 공유함으로써 링크의 이용율을 높일 수 있다는 것을 시뮬레이션을 통하여 보여준다.안 모니터링 기 능 등으로 조사되었다.도 멜-켑스트럼을 사용한 경우 67.5%, K-L계수를 사용한 경우 75.3%로 7.8%의 향상된 인식률을 보였으며 K-L계수와 회귀계수를 결합한 경우에서도 비교적 높은 인식률을 보여 숫자음에 대해서도 K-L계수의 유효성을 확인할 수 있었다..rc$ 구입할 때 중점적으로 살펴보는 사항은 신선도와 순수재래종 여부, 위생상태였다. 한편 소비자가 언제나 구입할 수 없다는 의견이 85.2%나 되어 원활한 공급과 시장조성이 아직 정착되지 않고 있었다. $\bigcirc$ 현재 유통되고 있는 재래종닭은 소비자 대부분이 잡종으로 인식하고 있었으며, 재래종과 일반육계와의 구별은 깃털색, 피부색, 정강이색등 외관상으로 구별하고 있었다. 체중에 대한 반응은 너무 작다는 의견이었고, 식품으로의 인식도는 비교적 고급식품으로 인식하고 있다. $\bigcirc$ 재래종닭고기의 브랜드화에 대한 견해는 젊고 소득이 높은 계층에서 브랜드화의 필요성을 강조하고 있다. $\bigcirc$ 재래종달걀의 소비형태는 대부분의 소비자가 좋아하였으나 아직 먹어보지 못한 응답자가 많았다. 재래종달걀의 맛에 대해서는 고소하고 독특하여 차별성을 느끼고 있었다. $\bigcirc$ 재래종달걀의 구입장소는 계란판매점(축협.농협), 슈퍼, 백화점, 재래닭 사육 농장등 다양하였으며 포장단위는 10개를 가장 선호하였고, 포장재료는 종이, 플라스틱, 짚의 순으로 좋아하였다. $\bigcirc$ 달걀의 가격은 200원정도를 적정하다고 하였으며, 크기는 (평균 52g)는 가장 적당하다고

  • PDF

Development of a Raspberry Pi-based Banknote Recognition System for the Visually Impaired (시각장애인을 위한 라즈베리 파이 기반 지폐 인식기 개발)

  • Lee, Jiwan;Ahn, Jihoo;Lee, Ki Yong
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.21-31
    • /
    • 2018
  • Korean banknotes are similar in size, and their braille tend to worn out as they get old. These characteristics of Korean banknotes make the blind people, who mainly rely on the braille, even harder to distinguish the banknotes. Not only that, this can even lead to economic loss. There are already existing systems for recognizing the banknotes, but they don't support Korean banknotes. Furthermore, because they are developed as a mobile application, it is not easy for the blind people to use the system. Therefore, in this paper, we develop a Raspberry Pi-based banknote recognition system that not only recognizes the Korean banknotes but also are easily accessible by the blind people. Our system starts recognition with a very simple action of the user, and the blind people can hear the recognition results by sound. In order to choose the best feature extraction algorithm that directly affects the performance of the system, we compare the performance of SIFT, SURF, and ORB, which are representative feature extraction algorithms at present, in real environments. Through experiments in various real environments, we adopted SIFT to implement our system, which showed the highest accuracy of 95%.

Design and implementation of a 3-axis Motion Sensor based SWAT Hand-signal Motion-recognition System (3축 모션 센서 기반 SWAT 수신호 모션 인식 시스템 설계 및 구현)

  • Yun, June;Pyun, Kihyun
    • Journal of Internet Computing and Services
    • /
    • v.15 no.4
    • /
    • pp.33-42
    • /
    • 2014
  • Hand-signal is an effective communication means in the situation where voice cannot be used for expression especially for soldiers. Vision-based approaches using cameras as input devices are widely suggested in the literature. However, these approaches are not suitable for soldiers that have unseen visions in many cases. in addition, existing special-glove approaches utilize the information of fingers only. Thus, they are still lack for soldiers' hand-signal recognition that involves not only finger motions, but also additional information such as the rotation of a hand. In this paper, we have designed and implemented a new recognition system for six military hand-signal motions, i. e., 'ready', 'move', quick move', 'crawl', 'stop', and 'lying-down'. For this purpose, we have proposed a finger-recognition method and motion-recognition methods. The finger-recognition method discriminate how much each finger is bended, i. e., 'completely flattened', 'slightly flattened', 'slightly bended', and 'completely bended'. The motion-recognition algorithms are based on the characterization of each hand-signal motion in terms of the three axes. Through repetitive experiments, our system have shown 91.2% of correct recognition.

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Design of detection method for malicious URL based on Deep Neural Network (뉴럴네트워크 기반에 악성 URL 탐지방법 설계)

  • Kwon, Hyun;Park, Sangjun;Kim, Yongchul
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.5
    • /
    • pp.30-37
    • /
    • 2021
  • Various devices are connected to the Internet, and attacks using the Internet are occurring. Among such attacks, there are attacks that use malicious URLs to make users access to wrong phishing sites or distribute malicious viruses. Therefore, how to detect such malicious URL attacks is one of the important security issues. Among recent deep learning technologies, neural networks are showing good performance in image recognition, speech recognition, and pattern recognition. This neural network can be applied to research that analyzes and detects patterns of malicious URL characteristics. In this paper, performance analysis according to various parameters was performed on a method of detecting malicious URLs using neural networks. In this paper, malicious URL detection performance was analyzed while changing the activation function, learning rate, and neural network structure. The experimental data was crawled by Alexa top 1 million and Whois to build the data, and the machine learning library used TensorFlow. As a result of the experiment, when the number of layers is 4, the learning rate is 0.005, and the number of nodes in each layer is 100, the accuracy of 97.8% and the f1 score of 92.94% are obtained.

A Study on the Weight Allocation Method of Humanist Input Value and Multiplex Modality using Tacit Data (암묵 데이터를 활용한 인문학 인풋값과 다중 모달리티의 가중치 할당 방법에 관한 연구)

  • Lee, Won-Tae;Kang, Jang-Mook
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.157-163
    • /
    • 2014
  • User's sensitivity is recognized as a very important parameter for communication between company, government and personnel. Especially in many studies, researchers use voice tone, voice speed, facial expression, moving direction and speed of body, and gestures to recognize the sensitivity. Multiplex modality is more precise than single modality however it has limited recognition rate and overload of data processing according to multi-sensing also an excellent algorithm is needed to deduce the sensing value. That is as each modality has different concept and property, errors might be happened to convert the human sensibility to standard values. To deal with this matter, the sensibility expression modality is needed to be extracted using technologies like analyzing of relational network, understanding of context and digital filter from multiplex modality. In specific situation to recognize the sensibility if the priority modality and other surrounding modalities are processed to implicit values, a robust system can be composed in comparison to the consuming of computer resource. As a result of this paper, it is proposed how to assign the weight of multiplex modality using implicit data.

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.