• Title/Summary/Keyword: Multimodal Information

Search Result 257, Processing Time 0.026 seconds

Multimodal Brain Image Registration based on Surface Distance and Surface Curvature Optimization (표면거리 및 표면곡률 최적화 기반 다중모달리티 뇌영상 정합)

  • Park Ji-Young;Choi Yoo-Joo;Kim Min-Jeong;Tae Woo-Suk;Hong Seung-Bong;Kim Myoung-Hee
    • The KIPS Transactions:PartA
    • /
    • v.11A no.5
    • /
    • pp.391-400
    • /
    • 2004
  • Within multimodal medical image registration techniques, which correlate different images and Provide integrated information, surface registration methods generally minimize the surface distance between two modalities. However, the features of two modalities acquired from one subject are similar. So, it can improve the accuracy of registration result to match two images based on optimization of both surface distance and shape feature. This research proposes a registration method which optimizes surface distance and surface curvature of two brain modalities. The registration process has two steps. First, surface information is extracted from the reference images and the test images. Next, the optimization process is performed. In the former step, the surface boundaries of regions of interest are extracted from the two modalities. And for the boundary of reference volume image, distance map and curvature map are generated. In the optimization step, a transformation minimizing both surface distance and surface curvature difference is determined by a cost function referring to the distance map and curvature map. The applying of the result transformation makes test volume be registered to reference volume. The suggested cost function makes possible a more robust and accurate registration result than that of the cost function using the surface distance only. Also, this research provides an efficient means for image analysis through volume visualization of the registration result.

Deformable Registration for MRI Medical Image

  • Li, Binglu;Kim, YoungSeop;Lee, Yong-Hwan
    • Journal of the Semiconductor & Display Technology
    • /
    • v.18 no.2
    • /
    • pp.63-66
    • /
    • 2019
  • Due to the development of medical imaging technology, different imaging technologies provide a large amount of effective information. However, different imaging method caused the limitations of information integrity by using single type of image. Combining different image together so that doctor can obtain the information from medical image comprehensively. Image registration algorithm based on mutual information has become one of the hotspots in the field of image registration with its high registration accuracy and wide applicability. Because the information theory-based registration technology is not dependent on the gray value difference of the image, and it is very suitable for multimodal medical image registration. However, the method based on mutual information has a robustness problem. The essential reason is that the mutual information itself is not have enough information between the pixel pairs, so that the mutual information is unstable during the registration process. A large number of local extreme values are generated, which finally cause mismatch. In order to overcome the shortages of mutual information registration method, this paper proposes a registration method combined with image spatial structure information and mutual information.

Activity Recognition of Workers and Passengers onboard Ships Using Multimodal Sensors in a Smartphone (선박 탑승자를 위한 다중 센서 기반의 스마트폰을 이용한 활동 인식 시스템)

  • Piyare, Rajeev Kumar;Lee, Seong Ro
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.9
    • /
    • pp.811-819
    • /
    • 2014
  • Activity recognition is a key component in identifying the context of a user for providing services based on the application such as medical, entertainment and tactical scenarios. Instead of applying numerous sensor devices, as observed in many previous investigations, we are proposing the use of smartphone with its built-in multimodal sensors as an unobtrusive sensor device for recognition of six physical daily activities. As an improvement to previous works, accelerometer, gyroscope and magnetometer data are fused to recognize activities more reliably. The evaluation indicates that the IBK classifier using window size of 2s with 50% overlapping yields the highest accuracy (i.e., up to 99.33%). To achieve this peak accuracy, simple time-domain and frequency-domain features were extracted from raw sensor data of the smartphone.

Modelling on Multi-modal Circular Data using von Mises Mixture Distribution

  • Jang, Young-Mi;Yang, Dong-Yoon;Lee, Jin-Young;Na, Jong-Hwa
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.3
    • /
    • pp.517-530
    • /
    • 2007
  • We studied a modelling process for unimodal and multimodal circular data by using von Mises and its mixture distribution. In particular we suggested EM algorithm to find ML estimates of the mixture model. Simulation results showed the suggested methods are very accurate. Applications to two kinds of real data sets are also included.

Analysis of Research Trends in Deep Learning-Based Video Captioning (딥러닝 기반 비디오 캡셔닝의 연구동향 분석)

  • Lyu Zhi;Eunju Lee;Youngsoo Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.13 no.1
    • /
    • pp.35-49
    • /
    • 2024
  • Video captioning technology, as a significant outcome of the integration between computer vision and natural language processing, has emerged as a key research direction in the field of artificial intelligence. This technology aims to achieve automatic understanding and language expression of video content, enabling computers to transform visual information in videos into textual form. This paper provides an initial analysis of the research trends in deep learning-based video captioning and categorizes them into four main groups: CNN-RNN-based Model, RNN-RNN-based Model, Multimodal-based Model, and Transformer-based Model, and explain the concept of each video captioning model. The features, pros and cons were discussed. This paper lists commonly used datasets and performance evaluation methods in the video captioning field. The dataset encompasses diverse domains and scenarios, offering extensive resources for the training and validation of video captioning models. The model performance evaluation method mentions major evaluation indicators and provides practical references for researchers to evaluate model performance from various angles. Finally, as future research tasks for video captioning, there are major challenges that need to be continuously improved, such as maintaining temporal consistency and accurate description of dynamic scenes, which increase the complexity in real-world applications, and new tasks that need to be studied are presented such as temporal relationship modeling and multimodal data integration.

Development of a Electronic Commerce System of Multi-Modal Information (다중모달을 이용한 전자상거래시스템 개발)

  • 장찬용;류갑상
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.729-732
    • /
    • 2001
  • Individual authentication system that take advantage of multimodal information is very efficient method that can take advantage of method of speech recognition, face recognition, electron signature etc. and protect important information from much dangers that exits on communication network whole as skill that construct security system. This paper deal product connected with hardware from internet space based on public key sign and electron signature description embodied system. Maintenance of public security is explaining that commercial transaction system implementation that is considered is possible as applying individual authentication.

  • PDF

A Development Method of SmartPhone E-book Supporting Multimodal Interactions (멀티모달 상호작용을 지원하는 스마트폰용 전자책 개발방법)

  • Lee, Sungjae;Kwon, Daehyeon;Cho, Soosun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.1678-1680
    • /
    • 2010
  • 최근 스마트폰의 보급이 급속도로 이루어지고 있고 전자책 시장이 성장함에 따라 스마트폰을 통해 전자책 등 다양한 교육 서비스를 제공하려는 시도가 활발해지고 있다. 앞으로 가방에 여러 권의 책을 소지하기 보다는 스마트폰이나 전자책 서비스가 가능한 단말기 하나만 들고 다니면서 책을 대체할 것이다. 본 논문에서는 단순한 텍스트기반이 아닌 멀티미디어 디바이스와 각종 센서를 이용함으로써 멀티모달 상호작용을 지원하는 전자책의 개발 방법을 제안한다.

Abnormal Active Pig Detection System using Audio-visual Multimodal Information (Audio-visual 멀티모달 정보 기반의 비정상 활성 돼지 탐지 시스템)

  • Chae, Heechan;Lee, Junhee;Lee, Jonguk;Chung, Yonghwa;Park, Daihee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.661-664
    • /
    • 2022
  • 양돈을 관리하는 데에 있어 비정상 개체를 식별하고 사전에 추적하거나 격리할 수 있는 양돈업 시스템을 구축하는 것은 효율적인 돈사관리를 위한 필수 요소이다. 그러나 돈사내의 이상 상황을 탐지하는 연구는 보고되었지만, 이상 상황이 발생한 돼지를 특정하여 식별하는 연구는 찾아보기 힘들다. 따라서, 본 연구에서는 소리를 활용하여 이상 상황이 발생함을 탐지한 후 영상을 활용하여 소리를 낸 특정 돼지를 식별할 수 있는 시스템을 제안한다. 해당 시스템의 주요 알고리즘은 활성 화자 탐지 문제에서 착안하여 이를 돈사에 맞게 적용하여, 비정상 소리를 내는 활성 돼지를 식별 가능하도록 구현하였다. 제안한 방법론은 모의 실험을 통해 돈사 내의 이상 상황이 발생한 돼지를 식별할 수 있음을 확인하였다.

A Deep Learning Based Approach to Recognizing Accompanying Status of Smartphone Users Using Multimodal Data (스마트폰 다종 데이터를 활용한 딥러닝 기반의 사용자 동행 상태 인식)

  • Kim, Kilho;Choi, Sangwoo;Chae, Moon-jung;Park, Heewoong;Lee, Jaehong;Park, Jonghun
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.163-177
    • /
    • 2019
  • As smartphones are getting widely used, human activity recognition (HAR) tasks for recognizing personal activities of smartphone users with multimodal data have been actively studied recently. The research area is expanding from the recognition of the simple body movement of an individual user to the recognition of low-level behavior and high-level behavior. However, HAR tasks for recognizing interaction behavior with other people, such as whether the user is accompanying or communicating with someone else, have gotten less attention so far. And previous research for recognizing interaction behavior has usually depended on audio, Bluetooth, and Wi-Fi sensors, which are vulnerable to privacy issues and require much time to collect enough data. Whereas physical sensors including accelerometer, magnetic field and gyroscope sensors are less vulnerable to privacy issues and can collect a large amount of data within a short time. In this paper, a method for detecting accompanying status based on deep learning model by only using multimodal physical sensor data, such as an accelerometer, magnetic field and gyroscope, was proposed. The accompanying status was defined as a redefinition of a part of the user interaction behavior, including whether the user is accompanying with an acquaintance at a close distance and the user is actively communicating with the acquaintance. A framework based on convolutional neural networks (CNN) and long short-term memory (LSTM) recurrent networks for classifying accompanying and conversation was proposed. First, a data preprocessing method which consists of time synchronization of multimodal data from different physical sensors, data normalization and sequence data generation was introduced. We applied the nearest interpolation to synchronize the time of collected data from different sensors. Normalization was performed for each x, y, z axis value of the sensor data, and the sequence data was generated according to the sliding window method. Then, the sequence data became the input for CNN, where feature maps representing local dependencies of the original sequence are extracted. The CNN consisted of 3 convolutional layers and did not have a pooling layer to maintain the temporal information of the sequence data. Next, LSTM recurrent networks received the feature maps, learned long-term dependencies from them and extracted features. The LSTM recurrent networks consisted of two layers, each with 128 cells. Finally, the extracted features were used for classification by softmax classifier. The loss function of the model was cross entropy function and the weights of the model were randomly initialized on a normal distribution with an average of 0 and a standard deviation of 0.1. The model was trained using adaptive moment estimation (ADAM) optimization algorithm and the mini batch size was set to 128. We applied dropout to input values of the LSTM recurrent networks to prevent overfitting. The initial learning rate was set to 0.001, and it decreased exponentially by 0.99 at the end of each epoch training. An Android smartphone application was developed and released to collect data. We collected smartphone data for a total of 18 subjects. Using the data, the model classified accompanying and conversation by 98.74% and 98.83% accuracy each. Both the F1 score and accuracy of the model were higher than the F1 score and accuracy of the majority vote classifier, support vector machine, and deep recurrent neural network. In the future research, we will focus on more rigorous multimodal sensor data synchronization methods that minimize the time stamp differences. In addition, we will further study transfer learning method that enables transfer of trained models tailored to the training data to the evaluation data that follows a different distribution. It is expected that a model capable of exhibiting robust recognition performance against changes in data that is not considered in the model learning stage will be obtained.

Environmental IoT-Enabled Multimodal Mashup Service for Smart Forest Fires Monitoring

  • Elmisery, Ahmed M.;Sertovic, Mirela
    • Journal of Multimedia Information System
    • /
    • v.4 no.4
    • /
    • pp.163-170
    • /
    • 2017
  • Internet of things (IoT) is a new paradigm for collecting, processing and analyzing various contents in order to detect anomalies and to monitor particular patterns in a specific environment. The collected data can be used to discover new patterns and to offer new insights. IoT-enabled data mashup is a new technology to combine various types of information from multiple sources into a single web service. Mashup services create a new horizon for different applications. Environmental monitoring is a serious tool for the state and private organizations, which are located in regions with environmental hazards and seek to gain insights to detect hazards and locate them clearly. These organizations may utilize IoT - enabled data mashup service to merge different types of datasets from different IoT sensor networks in order to leverage their data analytics performance and the accuracy of the predictions. This paper presents an IoT - enabled data mashup service, where the multimedia data is collected from the various IoT platforms, then fed into an environmental cognition service which executes different image processing techniques such as noise removal, segmentation, and feature extraction, in order to detect interesting patterns in hazardous areas. The noise present in the captured images is eliminated with the help of a noise removal and background subtraction processes. Markov based approach was utilized to segment the possible regions of interest. The viable features within each region were extracted using a multiresolution wavelet transform, then fed into a discriminative classifier to extract various patterns. Experimental results have shown an accurate detection performance and adequate processing time for the proposed approach. We also provide a data mashup scenario for an IoT-enabled environmental hazard detection service and experimentation results.