• 제목/요약/키워드: Query-by-speech

검색결과 14건 처리시간 0.02초

음성 질의 기반 디지털 사진 검색 기법 (A Query-by-Speech Scheme for Photo Albuming)

  • 김태성;서영주;이용주;김회린
    • 대한음성학회지:말소리
    • /
    • 제57호
    • /
    • pp.99-112
    • /
    • 2006
  • In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.

  • PDF

허밍 운율정보를 이용한 곡목 검색 기술 (Study on the song title query by humming melody information)

  • 이지연;한민수
    • 대한음성학회지:말소리
    • /
    • 제44호
    • /
    • pp.131-143
    • /
    • 2002
  • Music query by humming is a challenging problem since the humming signal inevitably contains much variation and inaccuracy. In this paper, we suggest an algorithm for querying a wanted song from music database by humming its melody. In order to suit or adapt the inaccurate peoples humming, a new melody representation technique is proposed. Our algorithm is basically a pitch and duration information-based one and performs fairly well. 85% of correct query rate of the song is achieved for the top 3 matches when tested with 20 songs.

  • PDF

음성정보 내용분석을 통한 골프 동영상에서의 선수별 이벤트 구간 검색 (Retrieval of Player Event in Golf Videos Using Spoken Content Analysis)

  • 김형국
    • 한국음향학회지
    • /
    • 제28권7호
    • /
    • pp.674-679
    • /
    • 2009
  • 본 논문은 골프 동영상에 포함된 오디오 정보로부터 검출된 이벤트 사운드 구간과 골프 선수이름이 포함된 음성구간을 결합하여 선수별 이벤트 구간을 검색하는 방식을 제안한다. 전체적인 시스템은 동영상으로부터 분할된 오디오 스트림으로부터 잡음제거, 오디오 구간분할, 음성 인식 등의 과정을 통한 자동색인 모듈과 사용자가 텍스트로 입력한 선수 이름을 발음열로 변환하고, 색인된 데이터베이스에서 질의된 선수 이름과 상응하는 음성구간과 연결되는 이벤트 구간을 찾아주는 검색 모듈로 구성된다. 선수이름 검색을 위해서 본 논문에서는 음소 기반, 단어 기반, 단어와 음소를 결합한 하이브리드 방식을 적용한 선수별 이벤트 구간 검색결과를 비교하였다.

음성 질의 처리를 위한 의미 기반 오류 수정 (Semantic-oriented Error Correction for Spoken Query Processing)

  • 정민우;김병창;이근배
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

Speech Query Recognition for Tamil Language Using Wavelet and Wavelet Packets

  • Iswarya, P.;Radha, V.
    • Journal of Information Processing Systems
    • /
    • 제13권5호
    • /
    • pp.1135-1148
    • /
    • 2017
  • Speech recognition is one of the fascinating fields in the area of Computer science. Accuracy of speech recognition system may reduce due to the presence of noise present in speech signal. Therefore noise removal is an essential step in Automatic Speech Recognition (ASR) system and this paper proposes a new technique called combined thresholding for noise removal. Feature extraction is process of converting acoustic signal into most valuable set of parameters. This paper also concentrates on improving Mel Frequency Cepstral Coefficients (MFCC) features by introducing Discrete Wavelet Packet Transform (DWPT) in the place of Discrete Fourier Transformation (DFT) block to provide an efficient signal analysis. The feature vector is varied in size, for choosing the correct length of feature vector Self Organizing Map (SOM) is used. As a single classifier does not provide enough accuracy, so this research proposes an Ensemble Support Vector Machine (ESVM) classifier where the fixed length feature vector from SOM is given as input, termed as ESVM_SOM. The experimental results showed that the proposed methods provide better results than the existing methods.

오디오 Fingerprint를 이용한 음악인식 연구 동향 (Music Recognition Using Audio Fingerprint: A Survey)

  • 이동현;임민규;김지환
    • 말소리와 음성과학
    • /
    • 제4권1호
    • /
    • pp.77-87
    • /
    • 2012
  • Interest in music recognition has been growing dramatically after NHN and Daum released their mobile applications for music recognition in 2010. Methods in music recognition based on audio analysis fall into two categories: music recognition using audio fingerprint and Query-by-Singing/Humming (QBSH). While music recognition using audio fingerprint receives music as its input, QBSH involves taking a user-hummed melody. In this paper, research trends are described for music recognition using audio fingerprint, focusing on two methods: one based on fingerprint generation using energy difference between consecutive bands and the other based on hash key generation between peak points. Details presented in the representative papers of each method are introduced.

이동통신 환경에서 강인한 음성 감성특징 추출에 대한 연구 (A Study on Robust Speech Emotion Feature Extraction Under the Mobile Communication Environment)

  • 조윤호;박규식
    • 한국음향학회지
    • /
    • 제25권6호
    • /
    • pp.269-276
    • /
    • 2006
  • 본 논문은 이동전화 (Cellular phone)를 통해 실시간으로 습득된 음성으로부터 사람의 감성 상태를 평상 혹은 화남으로 인식할 수 있는 음성 감성인식 시스템을 제안하였다. 일반적으로 이동전화를 통해 수신된 음성은 화자의 환경 잡음과 네트워크 잡음을 포함하고 있어 음성 신호의 감성특정을 왜곡하게 되고 이로 인해 인식 시스템에 심각한 성능저하를 초래하게 된다. 본 논문에서는 이러한 잡음 영향을 최소화하기 위해 비교적 단순한 구조와 적은 연산량을 가진 MA (Moving Average) 필터를 감성 특정벡터에 적용해서 잡음에 의한 시스템 성능저하를 최소화하였다. 또한 특정벡터를 최적화할 수 있는 SFS (Sequential Forward Selection) 기법을 사용해서 제안 감성인식 시스템의 성능을 한층 더 안 정화시켰으며 감성 패턴 분류기로는 k-NN과 SVM을 비교하였다. 실험 결과 제안 시스템은 이동통신 잡음 환경에서 약 86.5%의 높은 인식률을 달성할 수 있어 향후 고객 센터 (Call-center) 등에 유용하게 사용될 수 있을 것으로 기대된다.

온톨로지와 시맨틱 중재 에이전트를 이용한 실시간 통합 환경 구축에 관한 연구 (Real-time Data Integration using Ontology and Semantic Mediators)

  • 박진수
    • Asia pacific journal of information systems
    • /
    • 제16권4호
    • /
    • pp.151-178
    • /
    • 2006
  • The objective of this research is to develop a formal framework and methodology to facilitate real-time data integration, thus enabling semantic interoperability among distributed and heterogeneous information systems. The proposed approach is based on the concepts of "ontology" and "semantic mediators." An ontology is developed and used to capture the intension (including structure, integrity rules and meta-properties) of the database schema. We also develop the agent communication protocol for semantic reconciliation, which is based on the theory of speech acts and agent communication language. This protocol is used by a set of semantic mediators, which automatically detect and resolve various semantic conflicts at the data- and schema-levels by referring to the ontology. A mediation-based query processing technique is developed to provide uniform and integrated access to the multiple heterogeneous information sources. Prototype tools are being implemented to provide proof of concept for this work.

질의 효율적인 의사 결정 공격을 통한 오디오 적대적 예제 생성 연구 (Generating Audio Adversarial Examples Using a Query-Efficient Decision-Based Attack)

  • 서성관;문현준;손배훈;윤주범
    • 정보보호학회논문지
    • /
    • 제32권1호
    • /
    • pp.89-98
    • /
    • 2022
  • 딥러닝 기술이 여러 분야에 적용되면서 딥러닝 모델의 보안 문제인 적대적 공격기법 연구가 활발히 진행되었다. 적대적 공격은 이미지 분야에서 주로 연구가 되었는데 최근에는 모델의 분류 결과만 있으면 공격이 가능한 의사 결정 공격기법까지 발전했다. 그러나 오디오 분야의 경우 적대적 공격을 적용하는 연구가 비교적 더디게 이루어지고 있는데 본 논문에서는 오디오 분야에 최신 의사 결정 공격기법을 적용하고 개선한다. 최신 의사 결정 공격기법은 기울기 근사를 위해 많은 질의 수가 필요로 하는 단점이 있는데 본 논문에서는 기울기 근사에 필요한 벡터 탐색 공간을 축소하여 질의 효율성을 높인다. 실험 결과 최신 의사 결정 공격기법보다 공격 성공률을 50% 높였고, 원본 오디오와 적대적 예제의 차이를 75% 줄여 같은 질의 수 대비 더욱 작은 노이즈로 적대적 예제가 생성 가능함을 입증하였다.

Multimodal Approach for Summarizing and Indexing News Video

  • Kim, Jae-Gon;Chang, Hyun-Sung;Kim, Young-Tae;Kang, Kyeong-Ok;Kim, Mun-Churl;Kim, Jin-Woong;Kim, Hyung-Myung
    • ETRI Journal
    • /
    • 제24권1호
    • /
    • pp.1-11
    • /
    • 2002
  • A video summary abstracts the gist from an entire video and also enables efficient access to the desired content. In this paper, we propose a novel method for summarizing news video based on multimodal analysis of the content. The proposed method exploits the closed caption data to locate semantically meaningful highlights in a news video and speech signals in an audio stream to align the closed caption data with the video in a time-line. Then, the detected highlights are described using MPEG-7 Summarization Description Scheme, which allows efficient browsing of the content through such functionalities as multi-level abstracts and navigation guidance. Multimodal search and retrieval are also within the proposed framework. By indexing synchronized closed caption data, the video clips are searchable by inputting a text query. Intensive experiments with prototypical systems are presented to demonstrate the validity and reliability of the proposed method in real applications.

  • PDF