• Title/Summary/Keyword: Audio Segmentation

Search Result 22, Processing Time 0.023 seconds

A Robust Audio Fingerprinting Method Based on Segmentation Boundaries

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.260-265
    • /
    • 2012
  • A robust audio fingerprinting method is presented based on segmentation boundaries. In order to obtain robustness against linear speed changes, fingerprint extraction and matching are synchronized with the segmentation boundaries. Experimental results show that the proposed method is also robust against other common audio processing steps including low bit-rate compression, equalization, and time-scale modification.

Audio Segmentation and Classification Using Support Vector Machine and Fuzzy C-Means Clustering Techniques (서포트 벡터 머신과 퍼지 클러스터링 기법을 이용한 오디오 분할 및 분류)

  • Nguyen, Ngoc;Kang, Myeong-Su;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.19-26
    • /
    • 2012
  • The rapid increase of information imposes new demands of content management. The purpose of automatic audio segmentation and classification is to meet the rising need for efficient content management. With this reason, this paper proposes a high-accuracy algorithm that segments audio signals and classifies them into different classes such as speech, music, silence, and environment sounds. The proposed algorithm utilizes support vector machine (SVM) to detect audio-cuts, which are boundaries between different kinds of sounds using the parameter sequence. We then extract feature vectors that are composed of statistical data and they are used as an input of fuzzy c-means (FCM) classifier to partition audio-segments into different classes. To evaluate segmentation and classification performance of the proposed SVM-FCM based algorithm, we consider precision and recall rates for segmentation and classification accuracy for classification. Furthermore, we compare the proposed algorithm with other methods including binary and FCM classifiers in terms of segmentation performance. Experimental results show that the proposed algorithm outperforms other methods in both precision and recall rates.

Retrieval of Broadcast News Using Audio Content Analysis

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.74-79
    • /
    • 2007
  • In this paper, we report our recent work on a indexing and retrieval system of broadcast news using audio content analysis. Key issues addressed in this work are two major parts of the audio indexing system: anchorperson detection based on audio segmentation, and phone-based spoken document retrieval, developed in the framework of the emerging MPEG-7 standard. Experiments are conducted on a database of Britisch broadcast news videos. We discuss the development of the retrieval system, and the evaluation of each part and the retrieval system.

Sinusoidal Modeling of Polyphonic Audio Signals Using Dynamic Segmentation Method (동적 세그멘테이션을 이용한 폴리포닉 오디오 신호의 정현파 모델링)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.58-68
    • /
    • 2000
  • This paper proposes a sinusoidal modeling of polyphonic audio signals. Sinusoidal modeling which has been applied well to speech and monophonic signals cannot be applied directly to polyphonic signals because a window size for sinusoidal analysis cannot be determined over the entire signal. In addition, for high quality synthesized signal transient parts like attacks should be preserved which determines timbre of musical instrument. In this paper, a multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in sinusoidal modeling a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. The improved dynamic segmentation is proposed which shows better performance about transients and reduced computation. For various polyphonic audio signals the result of simulation shows the suggested sinusoidal modeling can model polyphonic audio signals without loss of perceptual quality.

  • PDF

A Study on the Market Segmentation Approach by the Use of Fashion Information Sources (패션 정보원 활용에 따른 시장세분화에 관한 연구)

  • Chung Myung Sun
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.16 no.3 s.43
    • /
    • pp.257-269
    • /
    • 1992
  • In the area of fashion business, market segmentation strategy has been paid attention for the purpose of assigning proficiently marketing resources. The use of fashion information in purchase decision process can serve as a base for market segmentation strategies. The purpose of this study was to identify four segmented profiles which are labelled as Print-oriented, Audio-visual oriented, Store intensive, and Pal advice group. Objectives were to determine fashion interest, fashion attitude and apparel selection criteria from each segment. For this study, the questionnaire was adminisered to 261 teachers and data were analyzed by using ANOVA, Regression and Pearson's Correlations. The results were as follows. 1. Print-oriented group had a positive attitude about fashion and they tended to place value on aesthetic and other-oriented criteria in selecting apparel. 2. Audio-visual oriented group had a strongly positive attitude about fashion but they tended to be less active toward buying fashion products and place much more value on other-oriented criteria in selecting apparel. 3. Store intensive group tended to be active toward buying fashion products. They had a positive attitude about fashion and placed value on aesthetic criteria in selecting apparel. 4. Pal advice group had a less positive attitude about fashion and tended to place value on economical and practical criteria in selecting apparel. Based on the profile of the each segment, it was suggested to express lingual or non-lingual symbols and to create concepts for meeting the needs of the segment

  • PDF

Application of Speech Recognition with Closed Caption for Content-Based Video Segmentations

  • Son, Jong-Mok;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.135-142
    • /
    • 2005
  • An important aspect of video indexing is the ability to segment video into meaningful segments, i.e., content-based video segmentation. Since the audio signal in the sound track is synchronized with image sequences in the video program, a speech signal in the sound track can be used to segment video into meaningful segments. In this paper, we propose a new approach to content-based video segmentation. This approach uses closed caption to construct a recognition network for speech recognition. Accurate time information for video segmentation is then obtained from the speech recognition process. For the video segmentation experiment for TV news programs, we made 56 video summaries successfully from 57 TV news stories. It demonstrates that the proposed scheme is very promising for content-based video segmentation.

  • PDF

Speaker Change Detection Based on a Graph-Partitioning Criterion

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.80-85
    • /
    • 2011
  • Speaker change detection involves the identification of time indices of an audio stream, where the identity of the speaker changes. In this paper, we propose novel measures for the speaker change detection based on a graph-partitioning criterion over the pairwise distance matrix of feature-vector stream. Experiments on both synthetic and real-world data were performed and showed that the proposed approach yield promising results compared with the conventional statistical measures.

Speaker Segmentation System Using Eigenvoice-based Speaker Weight Distance Method (Eigenvoice 기반 화자가중치 거리측정 방식을 이용한 화자 분할 시스템)

  • Choi, Mu-Yeol;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.266-272
    • /
    • 2012
  • Speaker segmentation is a process of automatically detecting the speaker boundary points in the audio data. Speaker segmentation methods are divided into two categories depending on whether they use a prior knowledge or not: One is the model-based segmentation and the other is the metric-based segmentation. In this paper, we introduce the eigenvoice-based speaker weight distance method and compare it with the representative metric-based methods. Also, we employ and compare the Euclidean and cosine similarity functions to calculate the distance between speaker weight vectors. And we verify that the speaker weight distance method is computationally very efficient compared with the method directly using the distance between the speaker adapted models constructed by the eigenvoice technique.

Retrieval of Player Event in Golf Videos Using Spoken Content Analysis (음성정보 내용분석을 통한 골프 동영상에서의 선수별 이벤트 구간 검색)

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.674-679
    • /
    • 2009
  • This paper proposes a method of player event retrieval using combination of two functions: detection of player name in speech information and detection of sound event from audio information in golf videos. The system consists of indexing module and retrieval module. At the indexing time audio segmentation and noise reduction are applied to audio stream demultiplexed from the golf videos. The noise-reduced speech is then fed into speech recognizer, which outputs spoken descriptors. The player name and sound event are indexed by the spoken descriptors. At search time, text query is converted into phoneme sequences. The lists of each query term are retrieved through a description matcher to identify full and partial phrase hits. For the retrieval of the player name, this paper compares the results of word-based, phoneme-based, and hybrid approach.

Time-Scale Modification of Polyphonic Audio Signals Using Sinusoidal Modeling (정현파 모델링을 이용한 폴리포닉 오디오 신호의 시간축 변화)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2001
  • This paper proposes a method of time-scale modification of polyphonic audio signals based on a sinusoidal model. The signals are modeled with sinusoidal component and noise component. A multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in time-scale modification a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. For extracting sinusoidal components and calculating their parameters matching pursuit algorithm is applied to each analysis frame of subband signal. In accordance with spectrum analysis a psychoacoustic model implementing the effect of frequency masking is incorporated with matching pursuit to provide a resonable stop condition of iteration and reduce the number of sinusoids. The noise component obtained by subtracting the synthesized signal with sinusoidal components from the original signal is modeled by line-segment model of short time spectrum envelope. For various polyphonic audio signals the result of simulation shows suggested sinusoidal modeling can synthesize original signal without loss of perceptual quality and do more robust and high quality time-scale modification for large scale factor because of representing transients without any perceptual loss.

  • PDF