• Title/Summary/Keyword: Multimodal model

Search Result 142, Processing Time 0.027 seconds

Anomaly Detection Methodology Based on Multimodal Deep Learning (멀티모달 딥 러닝 기반 이상 상황 탐지 방법론)

  • Lee, DongHoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.101-125
    • /
    • 2022
  • Recently, with the development of computing technology and the improvement of the cloud environment, deep learning technology has developed, and attempts to apply deep learning to various fields are increasing. A typical example is anomaly detection, which is a technique for identifying values or patterns that deviate from normal data. Among the representative types of anomaly detection, it is very difficult to detect a contextual anomaly that requires understanding of the overall situation. In general, detection of anomalies in image data is performed using a pre-trained model trained on large data. However, since this pre-trained model was created by focusing on object classification of images, there is a limit to be applied to anomaly detection that needs to understand complex situations created by various objects. Therefore, in this study, we newly propose a two-step pre-trained model for detecting abnormal situation. Our methodology performs additional learning from image captioning to understand not only mere objects but also the complicated situation created by them. Specifically, the proposed methodology transfers knowledge of the pre-trained model that has learned object classification with ImageNet data to the image captioning model, and uses the caption that describes the situation represented by the image. Afterwards, the weight obtained by learning the situational characteristics through images and captions is extracted and fine-tuning is performed to generate an anomaly detection model. To evaluate the performance of the proposed methodology, an anomaly detection experiment was performed on 400 situational images and the experimental results showed that the proposed methodology was superior in terms of anomaly detection accuracy and F1-score compared to the existing traditional pre-trained model.

Social Network Analysis of TV Drama via Location Knowledge-learned Deep Hypernetworks (장소 정보를 학습한 딥하이퍼넷 기반 TV드라마 소셜 네트워크 분석)

  • Nan, Chang-Jun;Kim, Kyung-Min;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.11
    • /
    • pp.619-624
    • /
    • 2016
  • Social-aware video displays not only the relationships between characters but also diverse information on topics such as economics, politics and culture as a story unfolds. Particularly, the speaking habits and behavioral patterns of people in different situations are very important for the analysis of social relationships. However, when dealing with this dynamic multi-modal data, it is difficult for a computer to analyze the drama data effectively. To solve this problem, previous studies employed the deep concept hierarchy (DCH) model to automatically construct and analyze social networks in a TV drama. Nevertheless, since location knowledge was not included, they can only analyze the social network as a whole in stories. In this research, we include location knowledge and analyze the social relations in different locations. We adopt data from approximately 4400 minutes of a TV drama Friends as our dataset. We process face recognition on the characters by using a convolutional- recursive neural networks model and utilize a bag of features model to classify scenes. Then, in different scenes, we establish the social network between the characters by using a deep concept hierarchy model and analyze the change in the social network while the stories unfold.

A Study on Biometric Model for Information Security (정보보안을 위한 생체 인식 모델에 관한 연구)

  • Jun-Yeong Kim;Se-Hoon Jung;Chun-Bo Sim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.317-326
    • /
    • 2024
  • Biometric recognition is a technology that determines whether a person is identified by extracting information on a person's biometric and behavioral characteristics with a specific device. Cyber threats such as forgery, duplication, and hacking of biometric characteristics are increasing in the field of biometrics. In response, the security system is strengthened and complex, and it is becoming difficult for individuals to use. To this end, multiple biometric models are being studied. Existing studies have suggested feature fusion methods, but comparisons between feature fusion methods are insufficient. Therefore, in this paper, we compared and evaluated the fusion method of multiple biometric models using fingerprint, face, and iris images. VGG-16, ResNet-50, EfficientNet-B1, EfficientNet-B4, EfficientNet-B7, and Inception-v3 were used for feature extraction, and the fusion methods of 'Sensor-Level', 'Feature-Level', 'Score-Level', and 'Rank-Level' were compared and evaluated for feature fusion. As a result of the comparative evaluation, the EfficientNet-B7 model showed 98.51% accuracy and high stability in the 'Feature-Level' fusion method. However, because the EfficietnNet-B7 model is large in size, model lightweight studies are needed for biocharacteristic fusion.

Choice Factors of Freight Transport Mode in Korea: Literature Review and Directions for Future Research (국내 화물운송수단 선택요인의 문헌 연구와 향후 연구 방향)

  • Choi, Chang-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.4
    • /
    • pp.1-13
    • /
    • 2018
  • The present study was conducted to analyze transport mode choice factors of shippers in Korea and to suggest policy implications and directions for future research. The findings showed that the research on freight mode choice factors in Korea is somewhat insufficient compared to that of other countries. In order to enhance the research, it is necessary to expand the number of studies and to strengthen the research to reflect characteristics of each transport mode. In particular, it is necessary to focus on identifying the characteristics of multimodal transport, including railway and shipping linked to truck. On the other hand, it was confirmed that the major factors influencing the choice of transport mode of shippers in Korea overlapped with foreign research cases. In addition, the implications for policy were derived when the analysis was separately conducted for Korea and other countries regarding individual transport mode and transport range. These results could be applied to various fields such as policy making to improve the efficiency of shippers' selection of transport mode and the estimation of transport mode choice model.

Voice and Video Call Continuity for Enterprise Users (기업형 사용자들을 위한 음성/영상 서비스 이동성 제공 방안)

  • Jung, Chang-Yong;Kim, Hyeon-Soo;Moon, Jeong-Hyeon;Kim, Hee-Dong
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2009.08a
    • /
    • pp.99-103
    • /
    • 2009
  • Recently, as wired and wireless communication services have rapidly developed and multimodal mobile devices which have various characteristics have widely spread, the need for new convergence services increases. The growing population of VoIP technologies and the high communication expense yield that the market of IP based telephony such as WiFi phone and IP phone is substituted for one of the conventional PSTN telephony. With the help of this trend, the wireline network operators desire to find a market in mobile networks. Therefore, they focus on Fixed Mobile Convergence (FMC) service as one of the key factors to accomplish this goal. FMC services are able to provide the mobility of voice services between circuit switched and packet switched networks. IP Multimedia Subsystem (IMS) based Voice Call Continuity (VCC) is one of the schemes to embody FMC services. As Application Server (AS) which has this VCC function provides seamless handover of services between heterogeneous networks, FMC subscribers can communicate seamlessly with others m WiFi domain and COMA domain using WiFi-COMA dual phone. Most of enterprises have already introduced IP network infrastructure and IP-PBX (Private Branch eXchange) for telephony. However, the problems of high communication cost and work inefficiency due to frequent outside jobs or business trips have remained. In order to solve these problems, demands for enterprise FMC services increase. In this paper, we introduce a new IP-PBX based VCC model that can provide seamless handover of voice services between WiFi and COMA networks for enterprise users and we investigate some interworking and security issues between Soft Switch (SSW) and IMS, or between IMSs. In addition, we introduce a new service that can provide the continuity of voice sessions as well as video sessions using Multimedia Session Continuity (MMSC) technology which has evolved from VCC. This service is expected to be one of the next-generation personalized services based on user's context.

  • PDF

Reliability Analysis Using Parametric and Nonparametric Input Modeling Methods (모수적·비모수적 입력모델링 기법을 이용한 신뢰성 해석)

  • Kang, Young-Jin;Hong, Jimin;Lim, O-Kaung;Noh, Yoojeong
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.1
    • /
    • pp.87-94
    • /
    • 2017
  • Reliability analysis(RA) and Reliability-based design optimization(RBDO) require statistical modeling of input random variables, which is parametrically or nonparametrically determined based on experimental data. For the parametric method, goodness-of-fit (GOF) test and model selection method are widely used, and a sequential statistical modeling method combining the merits of the two methods has been recently proposed. Kernel density estimation(KDE) is often used as a nonparametric method, and it well describes a distribution function when the number of data is small or a density function has multimodal distribution. Although accurate statistical models are needed to obtain accurate RA and RBDO results, accurate statistical modeling is difficult when the number of data is small. In this study, the accuracy of two statistical modeling methods, SSM and KDE, were compared according to the number of data. Through numerical examples, the RA results using the input models modeled by two methods were compared, and appropriate modeling method was proposed according to the number of data.

Vision-based Walking Guidance System Using Top-view Transform and Beam-ray Model (탑-뷰 변환과 빔-레이 모델을 이용한 영상기반 보행 안내 시스템)

  • Lin, Qing;Han, Young-Joon;Hahn, Hern-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.93-102
    • /
    • 2011
  • This paper presents a walking guidance system for blind pedestrians in an outdoor environment using just one single camera. Unlike many existing travel-aid systems that rely on stereo-vision, the proposed system aims to get necessary information of the road environment by using just single camera fixed at the belly of the user. To achieve this goal, a top-view image of the road is used, on which obstacles are detected by first extracting local extreme points and then verified by the polar edge histogram. Meanwhile, user motion is estimated by using optical flow in an area close to the user. Based on these information extracted from image domain, an audio message generation scheme is proposed to deliver guidance instructions via synthetic voice to the blind user. Experiments with several sidewalk video-clips show that the proposed walking guidance system is able to provide useful guidance instructions under certain sidewalk environments.

Developing the Design Guideline of Auditory User Interface for Domestic Appliances (가전제품의 청각 사용자 인터페이스(AUI) 설계를 위한 가이드라인 개발 연구)

  • Lee, Ju-Hwan;Jeon, Myoung-Hoon;Ahn, Jeong-Hee;Han, Kwang-Hee
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02b
    • /
    • pp.1-8
    • /
    • 2006
  • 본 연구는 가전제품의 제품군과 그 기능들에 따라 차별화 가능한 인지적, 감성적 '청각 사용자 인터페이스 디자인 가이드라인(Auditory User Interface Design Guideline)'을 마련하고, 가전제품의 작동기능 정보와 직관적으로 연합 가능한 청각신호(auditory signal)를 제작할 수 있는 지침을 제시하여 GUI 중심의 제품 설계에서 한 차원 확장되고 사용자의 다중감각적 특성이 적용된 디자인 방법을 실무에 적용하고자 하였다. 특히 AUI 에 대한 체계를 확립함으로써 브랜드 정체성(Brand Identity) 및 기업 이미지를 제고할 수 있다는 목적을 함께 고려하였다. 이러한 연구가 필요했던 이유는 가전제품에 대한 소비자의 심적 모형(mental model)과 감성 측면에서의 접근에 대한 요구 때문인데, 이는 AUI 의 체계적 적용이 아닌 임의적 연결(mapping)으로 인한 버저(buzzer) 청각신호의 짜증(annoying) 발생이 빈번한 사례들에서 출발한다. 또한 GUI 의 변화와 수준에 미치지 못하는 AUI 의 업그레이드 필요성과 가전제품에서의 감성 마케팅 경향을 반영하는 의미를 지니고 있다. 이와 함께 멀티미디어 환경의 급속한 확산으로 다중감각적 정보제시(multimodal display)가 요구되는 상황에 걸맞은 시도이다. 본 연구는 특정 가전제품이나 특정 기능이 지니고 있는 인지적, 감성적 차원의 속성을 청각신호(auditory signal)의 다양한 속성들로 유발하는 관계를 추출하고, 이를 형성하는 기본 메커니즘에 대한 경험적 자료를 제시하여, 가전제품의 AUI 디자인에 유용한 가이드라인을 제공하고자 하였다. 그러나 본 논문에서는 연구의 구체적이고 세부적인 결과보다는 전체적인 계획과 진행과정의 절차를 소개하여 관련분야 연구 진행의 참조적 틀을 마련하고자 한다.

  • PDF

A Comprehensive Framework for Estimating Pedestrian OD Matrix Using Spatial Information and Integrated Smart Card Data (공간정보와 통합 스마트카드 자료를 활용한 도시철도 역사 보행 기종점 분석 기법 개발)

  • JEONG, Eunbi;YOU, Soyoung Iris;LEE, Jun;KIM, Kyoungtae
    • Journal of Korean Society of Transportation
    • /
    • v.35 no.5
    • /
    • pp.409-422
    • /
    • 2017
  • TOD (Transit-Oriented Development) is one of the urban structure concentrated on the multifunctional space/district with public transportation system, which is introduced for maintaining sustainable future cities. With such trends, the project of building complex transferring centers located at a urban railway station has widely been spreaded and a comprehensive and systematic analytical framework is required to clarify and readily understand the complicated procedure of estimation with the large scale of the project. By doing so, this study is to develop a comprehensive analytical framework for estimating a pedestrian OD matrix using a spatial information and an integrated smart card data, which is so called a data depository and it has been applied to the Samseong station for the model validation. The proposed analytical framework contributes on providing a chance to possibly extend with digitalized and automated data collection technologies and a BigData mining methods.

Interactive Shape Analysis of the Hippocampus in a Virtual Environment (가상 환경에서의 해마 모델에 대한 대화식 형상 분석☆)

  • Kim, Jeong-Sik;Choi, Soo-Mi
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.165-181
    • /
    • 2009
  • This paper presents an effective representation scheme for the shape analysis of the hippocampal structure and a stereoscopic-haptic environment to enhance sense of realism. The parametric model and the 3D skeleton represent various types of hippocampal shapes and they are stored in the Octree data structure. So they can be used for the interactive shape analysis. And the 3D skeleton-based pose normalization allows us to align a position and an orientation of the 3D hippocampal models constructed from multimodal medical imaging data. We also have trained Support Vector Machine (SVM) for classifying between the normal controls and epileptic patients. Results suggest that the presented representation scheme provides various level of shape representation and the SVM can be a useful classifier in analyzing the shape differences between two groups. A stereoscopic-haptic virtual environment combining an auto-stereoscopic display with a force-feedback (or haptic) device takes an advantage of 3D applications for medicine because it improves space and depth perception.

  • PDF