Search | Korea Science

A Hybrid Neural Network model for Enhancement of Speaker Recognition in Video Stream (비디오 화자 인식 성능 향상을 위한 복합 신경망 모델)

Lee, Beom-Jin;Zhang, Byoung-Tak
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06b
- /
- pp.396-398
- /
- 2012
대부분의 실세계 데이터는 시간성을 띄고 있으므로 시간성을 지닌 데이터를 분석할 수 있는 기계 학습 방법론은 매우 중요하다. 이런 관점에서 비디오 데이터는 다양한 모달리티가 결합된 대표적인 시간 데이터 이므로 비디오 데이터를 대상으로 하는 기계 학습 방법은 큰 의미를 갖는다. 본 논문에서는 음성 채널에기반한 비디오 데이터 분석 방법의 예비 연구로 비디오 데이터에 등장하는 화자를 인식할 수 있는 간단한 방법을 소개한다. 제안 방법은 MFCC (Mel-frequency cepstrum coefficients)를 이용하여 인간 음성 특성의 분포를 분석한 후 분석 결과를 신경망에 입력하여 목표한 화자를 인식하는 복합 신경망 모델을 특징으로 한다. 실제 TV 드라마 데이터에서 가우시안 혼합모델, 가우시안 혼합 신경망 모델, 제안 방법의 화자 인식 성능을 비교한 결과 제안 방법이 가장 우수한 인식 성능을 보임을 확인하였다.

A Study on the Use of Speech Recognition Technology for Content-based Video Indexing and Retrieval (내용기반 비디오 색인 및 검색을 위한 음성인식기술 이용에 관한 연구)

손종목;배건성;강경옥;김재곤
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.2
- /
- pp.16-20
- /
- 2001
An important aspect of video program indexing and retrieval is the ability to segment video program into meaningful segments, in other words, the ability of content-based video program segmentation. In this paper, a new approach using speech recognition technology has been proposed for content-based video program segmentation. This approach uses speech recognition technique to synchronize closed caption with speech signal. Experimental results demonstrate that the proposed scheme is very promising for content-based video program segmentation.
PDF

A Logo Transition Detection Method for Opaque and Semi-Transparent TV Logo Recognition in Video (비디오에서 불투명 및 반투명 TV 로고 인식을 위한 로고 전이 검출 방법)

Roh, Myung-Cheol;Kang, Seung-Yeon;Lee, Seong-Whan
- Journal of KIISE:Software and Applications
- /
- v.35 no.12
- /
- pp.753-763
- /
- 2008
The amount of UCCs (User Created Contents) has been increasing rapidly and is associated with a serious copyright problem. Automatic logo detection in videos is an efficient means of overcoming the copyright problem. However, logos have varying characteristics, which make logo detection and recognition very difficult. Especially, there are frequent logo transitions in a video, comprising several video contents. This disrupts accurate video segmentation based on logos. Therefore, this paper proposes an accurate logo transition detection method for recognizing logos in digital video contents. The proposed method accurately segments a video according to logo and efficiently recognizes various types of logos. The experimental results demonstrate the effectiveness of the proposed method for logo detection and video segmentation according to logo.
PDF KSCI

Hand Gesture Tracking and Recognition for Video Editing (비디오 편집을 위한 손동작 추적 및 인식)

Park Ho-Sik;Cha Seung-Joo;Jung Ha-Young;Ra Sang-Dong;Bae Cheol-Soo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2006.05a
- /
- pp.697-700
- /
- 2006
본 논문에서는 동작에 근거한 새로운 비디오 편집 방법을 제안한다. 강의 비디오에서 전자 슬라이드 내용을 자동으로 검출하고 비디오와 동기화한다. 각 동기화된 표제의 동작을 연속적으로 추적 및 인식한 후, 등록된 화면과 슬라이드에서 변환 내용을 찾아 동작이 일어 나는 영역을 확인한다. 인식된 동작과 등록된 지점에서 슬라이드의 정보를 추출하여 슬라이드 영역을 부분적으로 확대한다거나 원본 비디오를 자동으로 편집함으로써 비디오의 질을 향상 시킬 수가 있다. 2 개의 비디오 가지고 실험한 결과 각각 95.5, 96.4%의 동작 인식 결과를 얻을 수 있었다.
PDF

Sign language translation using video captioning and sign language recognition using action recognition (비디오 캡셔닝을 적용한 수어 번역 및 행동 인식을 적용한 수어 인식)

Gi-Duk Kim;Geun-Hoo Lee
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2024.01a
- /
- pp.317-319
- /
- 2024
본 논문에서는 비디오 캡셔닝 알고리즘을 적용한 수어 번역 및 행동 인식 알고리즘을 적용한 수어 인식 알고리즘을 제안한다. 본 논문에 사용된 비디오 캡셔닝 알고리즘으로 40개의 연속된 입력 데이터 프레임을 CNN 네트워크를 통해 임베딩 하고 트랜스포머의 입력으로 하여 문장을 출력하였다. 행동 인식 알고리즘은 랜덤 샘플링을 하여 한 영상에 40개의 인덱스에서 40개의 연속된 데이터에 CNN 네트워크를 통해 임베딩하고 GRU, 트랜스포머를 결합한 RNN 모델을 통해 인식 결과를 출력하였다. 수어 번역에서 BLEU-4의 경우 7.85, CIDEr는 53.12를 얻었고 수어 인식으로 96.26%의 인식 정확도를 얻었다.
PDF

A Study on the Content-Based Video Information Indexing and Retrieval Using Closed Caption and Speech Recognition (캡션정보 및 음성인식을 이용한 내용기반 비디오 정보 색인 및 검색에 관한 연구)

손종목;김진웅;배건성
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 1999.11b
- /
- pp.141-145
- /
- 1999
뉴스나 드라마, 영화 등의 비디오에 대한 검색 시 일반 사용자의 요구에 가장 잘 부합되는 결과를 얻기 위해 비디오 데이터의 의미적 분석과 색인을 만드는 것이 필요하다. 일반적으로 음성신호가 비디오 데이터의 내용을 잘 나타내고 비디오와 동기가 이루어져 있으므로, 내용기반 검색을 위한 비디오 데이터 분할에 효율적으로 이용될 수 있다 본 논문에서는 캡션 정보가 주어지는 방송뉴스 프로그램을 대상으로 효율적인 검색, 색인을 위한 비디오 데이터의 분할에 음성인식기술을 적용하는 방법을 제안하고 그에 따른 실험결과를 제시한다.
PDF

A New Residual Attention Network based on Attention Models for Human Action Recognition in Video

Kim, Jee-Hyun;Cho, Young-Im
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.1
- /
- pp.55-61
- /
- 2020
With the development of deep learning technology and advances in computing power, video-based research is now gaining more and more attention. Video data contains a large amount of temporal and spatial information, which is the biggest difference compared with image data. It has a larger amount of data. It has attracted intense attention in computer vision. Among them, motion recognition is one of the research focuses. However, the action recognition of human in the video is extremely complex and challenging subject. Based on many research in human beings, we have found that artificial intelligence-like attention mechanisms are an efficient model for cognition. This efficient model is ideal for processing image information and complex continuous video information. We introduce this attention mechanism into video action recognition, paying attention to human actions in video and effectively improving recognition efficiency. In this paper, we propose a new 3D residual attention network using convolutional neural network based on two attention models to identify human action behavior in the video. An evaluation result of our model showed up to 90.7% accuracy.
https://doi.org/10.9708/jksci.2020.25.01.055 인용 PDF KSCI

The Key Frame Extraction and Anchor Recognition in News Videos (뉴스 비디오에서 키 프레임 추출과 앵커 인식)

신성윤;임정훈;이양원;표성배
- Proceedings of the Korea Multimedia Society Conference
- /
- 2001.11a
- /
- pp.286-289
- /
- 2001
뉴스 비디오에서 앵커가 등장하는 첫 번째 프레임은 하나의 뉴스를 샷으로 설정하는데 기준이 되는 키 프레임이라고 볼 수 있다. 본 논문에서는 뉴스 비디오의 장면 전환을 검출을 위하여 컬러 히스토그램과 $\chi$$^2$ 히스토그램을 합성한 방법을 이용하여 키 프레임을 추출하며, 추출된 키 프레임을 대상으로 앵커 프레임의 공간적 구성과 얼굴의 특징 정보에 대한 사전 지식을 바탕으로 한 유사성 측정을 통하여 앵커를 인식하도록 한다. 앵커로 인식된 프레임은 하나의 뉴스 신에 대한 키 프레임이 되며 뉴스 비디오를 색인화 하는데 중요한 역할을 수행한다.
PDF

Recognition-Based Gesture Spotting for Video Game Interface (비디오 게임 인터페이스를 위한 인식 기반 제스처 분할)

Han, Eun-Jung;Kang, Hyun;Jung, Kee-Chul
- Journal of Korea Multimedia Society
- /
- v.8 no.9
- /
- pp.1177-1186
- /
- 2005
In vision-based interfaces for video games, gestures are used as commands of the games instead of pressing down a keyboard or a mouse. In these Interfaces, unintentional movements and continuous gestures have to be permitted to give a user more natural interface. For this problem, this paper proposes a novel gesture spotting method that combines spotting with recognition. It recognizes the meaningful movements concurrently while separating unintentional movements from a given image sequence. We applied our method to the recognition of the upper-body gestures for interfacing between a video game (Quake II) and its user. Experimental results show that the proposed method is on average $93.36\%$ in spotting gestures from continuous gestures, confirming its potential for a gesture-based interface for computer games.
PDF

Caption Detection and Recognition for Video Image Information Retrieval (비디오 영상 정보 검색을 위한 문자 추출 및 인식)

구건서
- Journal of the Korea Computer Industry Society
- /
- v.3 no.7
- /
- pp.901-914
- /
- 2002
In this paper, We propose an efficient automatic caption detection and location method, caption recognition using FE-MCBP(Feature Extraction based Multichained BackPropagation) neural network for content based retrieval of video. Frames are selected at fixed time interval from video and key frames are selected by gray scale histogram method. for each key frames, segmentation is performed and caption lines are detected using line scan method. lastly each characters are separated. This research improves speed and efficiency by color segmentation using local maximum analysis method before line scanning. Caption detection is a first stage of multimedia database organization and detected captions are used as input of text recognition system. Recognized captions can be searched by content based retrieval method.
PDF

Search Result 370, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)