• Title/Summary/Keyword: multimodal system

Search Result 227, Processing Time 0.027 seconds

Design of Lightweight Artificial Intelligence System for Multimodal Signal Processing (멀티모달 신호처리를 위한 경량 인공지능 시스템 설계)

  • Kim, Byung-Soo;Lee, Jea-Hack;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.1037-1042
    • /
    • 2018
  • The neuromorphic technology has been researched for decades, which learns and processes the information by imitating the human brain. The hardware implementations of neuromorphic systems are configured with highly parallel processing structures and a number of simple computational units. It can achieve high processing speed, low power consumption, and low hardware complexity. Recently, the interests of the neuromorphic technology for low power and small embedded systems have been increasing rapidly. To implement low-complexity hardware, it is necessary to reduce input data dimension without accuracy loss. This paper proposed a low-complexity artificial intelligent engine which consists of parallel neuron engines and a feature extractor. A artificial intelligent engine has a number of neuron engines and its controller to process multimodal sensor data. We verified the performance of the proposed neuron engine including the designed artificial intelligent engines, the feature extractor, and a Micro Controller Unit(MCU).

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System (멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델)

  • Park, Yeong Joon;Jo, Byeong Cheol;Lee, Kyoung Uk;Kim, Kyung Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.138-147
    • /
    • 2022
  • Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

Application of self organizing genetic algorithm

  • Jeong, Il-Kwon;Lee, Ju-Jang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1995.10a
    • /
    • pp.18-21
    • /
    • 1995
  • In this paper we describe a new method for multimodal function optimization using genetic algorithms(GAs). We propose adaptation rules for GA parameters such as population size, crossover probability and mutation probability. In the self organizing genetic algorithm(SOGA), SOGA parameters change according to the adaptation rules. Thus, we do not have to set the parameters manually. We discuss about SOGA and those of other approaches for adapting operator probabilities in GAs. The validity of the proposed algorithm will be verified in a simulation example of system identification.

  • PDF

Speech and Textual Data Fusion for Emotion Detection: A Multimodal Deep Learning Approach (감정 인지를 위한 음성 및 텍스트 데이터 퓨전: 다중 모달 딥 러닝 접근법)

  • Edward Dwijayanto Cahyadi;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.526-527
    • /
    • 2023
  • Speech emotion recognition(SER) is one of the interesting topics in the machine learning field. By developing multi-modal speech emotion recognition system, we can get numerous benefits. This paper explain about fusing BERT as the text recognizer and CNN as the speech recognizer to built a multi-modal SER system.

Implementation of a Vehicle Monitoring System using Multimodal Information (다중 정보를 활용하는 차량 모니터링 시스템의 구현)

  • Park, Su-Wan;Son, Jun-U
    • Journal of Korean Society of Transportation
    • /
    • v.29 no.3
    • /
    • pp.41-48
    • /
    • 2011
  • In order to detect driver's state in a driver safety system, both overt and covert measures such as driving performance, visual attention, physiological arousal and traffic situation should be collected and interpreted in the driving context. In this paper, we suggest a vehicle monitoring system that provides multimodal information on a broad set of measures simultaneously collected from multiple domains including driver, vehicle and road environment using an elaborate timer equipped as a soft synchronization mechanism. Using a master timer that records key values from various modules with the same master time of short and precise interval, the monitoring system provides more accurate context awareness through synchronized data at any given time. This paper also discusses the data collected from nine young drivers performing a cognitive secondary task through this system while driving.

An On-line Speech and Character Combined Recognition System for Multimodal Interfaces (멀티모달 인터페이스를 위한 음성 및 문자 공용 인식시스템의 구현)

  • 석수영;김민정;김광수;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.2
    • /
    • pp.216-223
    • /
    • 2003
  • In this paper, we present SCCRS(Speech and Character Combined Recognition System) for speaker /writer independent. on-line multimodal interfaces. In general, it has been known that the CHMM(Continuous Hidden Markov Mode] ) is very useful method for speech recognition and on-line character recognition, respectively. In the proposed method, the same CHMM is applied to both speech and character recognition, so as to construct a combined system. For such a purpose, 115 CHMM having 3 states and 9 transitions are constructed using MLE(Maximum Likelihood Estimation) algorithm. Different features are extracted for speech and character recognition: MFCC(Mel Frequency Cepstrum Coefficient) Is used for speech in the preprocessing, while position parameter is utilized for cursive character At recognition step, the proposed SCCRS employs OPDP (One Pass Dynamic Programming), so as to be a practical combined recognition system. Experimental results show that the recognition rates for voice phoneme, voice word, cursive character grapheme, and cursive character word are 51.65%, 88.6%, 85.3%, and 85.6%, respectively, when not using any language models. It demonstrates the efficiency of the proposed system.

  • PDF

Implementation of Web Game System using Multi Modal Interfaces (멀티모달 인터페이스를 사용한 웹 게임 시스템의 구현)

  • Lee, Jun;Ahn, Young-Seok;Kim, Jee-In;Park, Sung-Jun
    • Journal of Korea Game Society
    • /
    • v.9 no.6
    • /
    • pp.127-137
    • /
    • 2009
  • Web Game provides computer games through a web browser, and have several benefits. First, we can access the game through web browser easily if we are connected to the internet environment. Second, usually we don't need much space of a game data for downloading it into a local disk. Nowadays, an industry area of Web Game has a chance to grow through advancements of mobile computing technologies and an age of Web 2.0. This study proposes a Web Game system that users can apply to manipulate the game with multimodal interfaces and mobile devices for intuitive interactions. In this study, multi modal interfaces are used to efficient control the game, and both ordinary computers and mobile devices are applied to the game scenarios. The proposed system is evaluated in both performance and user acceptability in comparison with previous approaches. The proposed system reduces total clear time and numbers of errors of the experiment in a mobile device. It can also provide good satisfactions of users.

  • PDF

A study of using quality for Radial Basis Function based score-level fusion in multimodal biometrics (RBF 기반 유사도 단계 융합 다중 생체 인식에서의 품질 활용 방안 연구)

  • Choi, Hyun-Soek;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.192-200
    • /
    • 2008
  • Multimodal biometrics is a method for personal authentication and verification using more than two types of biometrics data. RBF based score-level fusion uses pattern recognition algorithm for multimodal biometrics, seeking the optimal decision boundary to classify score feature vectors each of which consists of matching scores obtained from several unimodal biometrics system for each sample. In this case, all matching scores are assumed to have the same reliability. However, in recent research it is reported that the quality of input sample affects the result of biometrics. Currently the matching scores having low reliability caused by low quality of samples are not currently considered for pattern recognition modelling in multimodal biometrics. To solve this problem, in this paper, we proposed the RBF based score-level fusion approach which employs quality information of input biometrics data to adjust decision boundary. As a result the proposed method with Qualify information showed better recognition performance than both the unimodal biometrics and the usual RBF based score-level fusion without using quality information.

Multimodal Emotional State Estimation Model for Implementation of Intelligent Exhibition Services (지능형 전시 서비스 구현을 위한 멀티모달 감정 상태 추정 모형)

  • Lee, Kichun;Choi, So Yun;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.1-14
    • /
    • 2014
  • Both researchers and practitioners are showing an increased interested in interactive exhibition services. Interactive exhibition services are designed to directly respond to visitor responses in real time, so as to fully engage visitors' interest and enhance their satisfaction. In order to install an effective interactive exhibition service, it is essential to adopt intelligent technologies that enable accurate estimation of a visitor's emotional state from responses to exhibited stimulus. Studies undertaken so far have attempted to estimate the human emotional state, most of them doing so by gauging either facial expressions or audio responses. However, the most recent research suggests that, a multimodal approach that uses people's multiple responses simultaneously may lead to better estimation. Given this context, we propose a new multimodal emotional state estimation model that uses various responses including facial expressions, gestures, and movements measured by the Microsoft Kinect Sensor. In order to effectively handle a large amount of sensory data, we propose to use stratified sampling-based MRA (multiple regression analysis) as our estimation method. To validate the usefulness of the proposed model, we collected 602,599 responses and emotional state data with 274 variables from 15 people. When we applied our model to the data set, we found that our model estimated the levels of valence and arousal in the 10~15% error range. Since our proposed model is simple and stable, we expect that it will be applied not only in intelligent exhibition services, but also in other areas such as e-learning and personalized advertising.

A study of effective contents construction for AR based English learning (AR기반 영어학습을 위한 효과적 콘텐츠 구성 방향에 대한 연구)

  • Kim, Young-Seop;Jeon, Soo-Jin;Lim, Sang-Min
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.10 no.4
    • /
    • pp.143-147
    • /
    • 2011
  • The system using augmented reality can save the time and cost. It is verified in various fields under the possibility of a technology by solving unrealistic feeling in the virtual space. Therefore, augmented reality has a variety of the potential to be used. Generally, multimodal senses such as visual/auditory/tactile feed back are well known as a method for enhancing the immersion in case of interaction with virtual object. By adapting tangible object we can provide touch sensation to users. a 3D model of the same scale overlays the whole area of the tangible object; thus, the marker area is invisible. This contributes to enhancing immersive and natural images to users. Finally, multimodal feedback also creates better immersion. In this paper, sound feedback is considered. By further improving immersion learning augmented reality for children with the initial step learning content is presented. Augmented reality is in the intermediate stages between future world and real world as well as its adaptability is estimated more than virtual reality.

  • PDF