• Title/Summary/Keyword: Feature representation

Search Result 422, Processing Time 0.028 seconds

Ontology-based Image Understanding Systems (온톨로지 기반 영상이해 시스템)

  • Lee, In-K.;Seo, Suk-T.;Jeong, Hye-C.;Son, Seo-H.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.3
    • /
    • pp.328-335
    • /
    • 2007
  • Ontology is represented by the shared concepts and relations among those. Many studies have been actively working on sharing human's knowledge with that of systems by using it. For a typical example, there is the design and implementation of ontology system for image understanding. However conventional studies on ontology-based image understanding have proposed not concrete methods but conceptual idea. In this paper, we propose an ontology-based image understanding system with following four processes: i)knowledge representation of a specific domain by the ontology, ii)feature extraction of objects through image processing and image analysis, iii)image interpretation by object features, and iv)reduction of ambiguity existing in image interpretation by ontology reasoning. We implement an image understanding system based on the proposed processed, and show the effectiveness of the proposed system from experimental results in a specific domain.

The Comparison of the Long-Take Technique of Cinemas and the Continuity of Architectural Space Based on Lacan's Visual-Art Theory (라깡의 시지각 예술이론에 의한 영화의 롱 테이크 기법과 건축 공간의 연속성 비교)

  • Choi, Hyo-Sik
    • Korean Institute of Interior Design Journal
    • /
    • v.26 no.6
    • /
    • pp.81-96
    • /
    • 2017
  • This study aims at establishing a basic theory for the combination of architecture and movies by comparing the long-take technique of movies and the continuity of space, one of space composition principles, which is important in digital architecture based on Jacques Lacan's visual-art theory and finding common features and differences of them. The following is a summary of the conclusions. First, analyzing the long-take technique on the basis of Lacan's visual-art theory found that the subject of representation is scenes of movies and that staring shows features of narrative. Second, the long-take technique can be thought as a cinematic technique which tries to realize the real order beyond the symbolic order in real life through the process of continuous replication of replication of replication of a scene in one shot. Third, in contemporary architecture, which is compared to the long-take technique in the past, the inclined space of opened gaze is similar to the method which tries to realize architectural space of the reality which belongs to the symbolic order close to the real order which belong to significant in human unconsciousness. Fourth, the freeform continuous space of closed gaze, which can be compared to contemporary long take combined with computer graphic technology, has more difficulty in realizing the real order than the long-take technique in the past and inclined, continuous space as the feature which belongs to $signifi{\acute{e}}$ in human consciousness has been strengthened through the circulation which repeats and expands along an observer's movement. Fifth, when the contemporary long-take technique and freeform continuous space expand gaze which opens from the inside to the outside, it is considered that the space which is closer to the real order than the classic long-take technique and inclined continuous space can be created.

Electromyographic evidence for a gestural-overlap analysis of vowel devoicing in Korean

  • Jun, Sun-A;Beckman, M.;Niimi, Seiji;Tiede, Mark
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.153-200
    • /
    • 1997
  • In languages such as Japanese, it is very common to observe that short peripheral vowel are completely voiceless when surrounded by voiceless consonants. This phenomenon has been known as Montreal French, Shanghai Chinese, Greek, and Korean. Traditionally this phenomenon has been described as a phonological rule that either categorically deletes the vowel or changes the [+voice] feature of the vowel to [-voice]. This analysis was supported by Sawashima (1971) and Hirose (1971)'s observation that there are two distinct EMG patterns for voiced and devoiced vowel in Japanese. Close examination of the phonetic evidence based on acoustic data, however, shows that these phonological characterizations are not tenable (Jun & Beckman 1993, 1994). In this paper, we examined the vowel devoicing phenomenon in Korean using data from ENG fiberscopic and acoustic recorders of 100 sentences produced by one Korean speaker. The results show that there is variability in the 'degree of devoicing' in both acoustic and EMG signals, and in the patterns of glottal closing and opening across different devoiced tokens. There seems to be no categorical difference between devoiced and voiced tokens, for either EMG activity events or glottal patterns. All of these observations support the notion that vowel devoicing in Korean can not be described as the result of the application of a phonological rule. Rather, devoicing seems to be a highly variable 'phonetic' process, a more or less subtle variation in the specification of such phonetic metrics as degree and timing of glottal opening, or of associated subglottal pressure or intra-oral airflow associated with concurrent tone and stricture specifications. Some of token-pair comparisons are amenable to an explanation in terms of gestural overlap and undershoot. However, the effect of gestural timing on vocal fold state seems to be a highly nonlinear function of the interaction among specifications for the relative timing of glottal adduction and abduction gestures, of the amplitudes of the overlapped gestures, of aerodynamic conditions created by concurrent oral tonal gestures, and so on. In summary, to understand devoicing, it will be necessary to examine its effect on phonetic representation of events in many parts of the vocal tracts, and at many stages of the speech chain between the motor intent and the acoustic signal that reaches the hearer's ear.

  • PDF

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Investigation of Timbre-related Music Feature Learning using Separated Vocal Signals (분리된 보컬을 활용한 음색기반 음악 특성 탐색 연구)

  • Lee, Seungjin
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1024-1034
    • /
    • 2019
  • Preference for music is determined by a variety of factors, and identifying characteristics that reflect specific factors is important for music recommendations. In this paper, we propose a method to extract the singing voice related music features reflecting various musical characteristics by using a model learned for singer identification. The model can be trained using a music source containing a background accompaniment, but it may provide degraded singer identification performance. In order to mitigate this problem, this study performs a preliminary work to separate the background accompaniment, and creates a data set composed of separated vocals by using the proven model structure that appeared in SiSEC, Signal Separation and Evaluation Campaign. Finally, we use the separated vocals to discover the singing voice related music features that reflect the singer's voice. We compare the effects of source separation against existing methods that use music source without source separation.

Human Action Recognition Via Multi-modality Information

  • Gao, Zan;Song, Jian-Ming;Zhang, Hua;Liu, An-An;Xue, Yan-Bing;Xu, Guang-Ping
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.2
    • /
    • pp.739-748
    • /
    • 2014
  • In this paper, we propose pyramid appearance and global structure action descriptors on both RGB and depth motion history images and a model-free method for human action recognition. In proposed algorithm, we firstly construct motion history image for both RGB and depth channels, at the same time, depth information is employed to filter RGB information, after that, different action descriptors are extracted from depth and RGB MHIs to represent these actions, and then multimodality information collaborative representation and recognition model, in which multi-modality information are put into object function naturally, and information fusion and action recognition also be done together, is proposed to classify human actions. To demonstrate the superiority of the proposed method, we evaluate it on MSR Action3D and DHA datasets, the well-known dataset for human action recognition. Large scale experiment shows our descriptors are robust, stable and efficient, when comparing with the-state-of-the-art algorithms, the performances of our descriptors are better than that of them, further, the performance of combined descriptors is much better than just using sole descriptor. What is more, our proposed model outperforms the state-of-the-art methods on both MSR Action3D and DHA datasets.

Computational Analysis of PCA-based Face Recognition Algorithms (PCA기반의 얼굴인식 알고리즘들에 대한 연산방법 분석)

  • Hyeon Joon Moon;Sang Hoon Kim
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.2
    • /
    • pp.247-258
    • /
    • 2003
  • Principal component analysis (PCA) based algorithms form the basis of numerous algorithms and studies in the face recognition literature. PCA is a statistical technique and its incorporation into a face recognition system requires numerous design decisions. We explicitly take the design decisions by in-troducing a generic modular PCA-algorithm since some of these decision ate not documented in the literature We experiment with different implementations of each module, and evaluate the different im-plementations using the September 1996 FERET evaluation protocol (the do facto standard method for evaluating face recognition algorithms). We experiment with (1) changing the illumination normalization procedure; (2) studying effects on algorithm performance of compressing images using JPEG and wavelet compression algorithms; (3) varying the number of eigenvectors in the representation; and (4) changing the similarity measure in classification process. We perform two experiments. In the first experiment, we report performance results on the standard September 1996 FERET large gallery image sets. The result shows that empirical analysis of preprocessing, feature extraction, and matching performance is extremely important in order to produce optimized performance. In the second experiment, we examine variations in algorithm performance based on 100 randomly generated image sets (galleries) of the same size. The result shows that a reasonable threshold for measuring significant difference in performance for the classifiers is 0.10.

  • PDF

Region-Based Image Retrieval System using Spatial Location Information as Weights for Relevance Feedback (공간 위치 정보를 적합성 피드백을 위한 가중치로 사용하는 영역 기반 이미지 검색 시스템)

  • Song Jae-Won;Kim Deok-Hwan;Lee Ju-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.4 s.42
    • /
    • pp.1-7
    • /
    • 2006
  • Recently, studies of relevance feedback to increase the performance of image retrieval has been activated. In this Paper a new region weighting method in region based image retrieval with relevance feedback is proposed to reduce the semantic gap between the low level feature representation and the high level concept in a given query image. The new weighting method determines the importance of regions according to the spatial locations of regions in an image. Experimental results demonstrate that the retrieval quality of our method is about 18% in recall better than that of area percentage approach. and about 11% in recall better than that of region frequency weighted by inverse image frequency approach and the retrieval time of our method is a tenth of that of region frequency approach.

  • PDF

LDI (Layered Depth Image) Representation Method using 3D GIS Implementation (LDI 표현방법을 이용한 3D GIS 구현)

  • Song Sang-Hun;Jung Young-Kee
    • KSCI Review
    • /
    • v.14 no.1
    • /
    • pp.231-239
    • /
    • 2006
  • Geographic information system (GIS) geography reference it talks the software system which is possible. When like this geographic information system in key feature trying to observe the problem which is an expression of geography information in the center, the research and development with 3 dimension expressions is active from 2 dimension expressions of existing and it is advanced. double meaning geography information which is huge to be quick, the place where it controls efficiently there is a many problem, the ring from the dissertation which it sees and 3 dimensions and efficient scene of the GIS rendering compared to the ring from hazard image base modeling and rendering compared to hazard proposal LDI (Layered Depth Images) it uses GIS rendering compared to the ring to sleep it does. It acquired the terrain data of 3 dimensions from thread side base method. terrain data of 3 dimensions which are acquired like this the place where it has depth information like this depth information in base and the LDI, it did it created. Also it was a traditional modeling method and 3DS-Max it used and it created the LDI. It used LDI information which is acquired like this and the GIS of more efficient 3 dimensions rendering compared to the possibility of ring it was.

  • PDF

Recognition and Modeling of 3D Environment based on Local Invariant Features (지역적 불변특징 기반의 3차원 환경인식 및 모델링)

  • Jang, Dae-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.3
    • /
    • pp.31-39
    • /
    • 2006
  • This paper presents a novel approach to real-time recognition of 3D environment and objects for various applications such as intelligent robots, intelligent vehicles, intelligent buildings,..etc. First, we establish the three fundamental principles that humans use for recognizing and interacting with the environment. These principles have led to the development of an integrated approach to real-time 3D recognition and modeling, as follows: 1) It starts with a rapid but approximate characterization of the geometric configuration of workspace by identifying global plane features. 2) It quickly recognizes known objects in environment and replaces them by their models in database based on 3D registration. 3) It models the geometric details the geometric details on the fly adaptively to the need of the given task based on a multi-resolution octree representation. SIFT features with their 3D position data, referred to here as stereo-sis SIFT, are used extensively, together with point clouds, for fast extraction of global plane features, for fast recognition of objects, for fast registration of scenes, as well as for overcoming incomplete and noisy nature of point clouds.

  • PDF