• Title/Summary/Keyword: Learned images

Search Result 208, Processing Time 0.027 seconds

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Skin Color Detection Using Partially Connected Multi-layer Perceptron of Two Color Models (두 칼라 모델의 부분연결 다층 퍼셉트론을 사용한 피부색 검출)

  • Kim, Sung-Hoon;Lee, Hyon-Soo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.107-115
    • /
    • 2009
  • Skin color detection is used to classify input pixels into skin and non skin area, and it requires the classifier to have a high classification rate. In previous work, most classifiers used single color model for skin color detection. However the classification rate can be increased by using more than one color model due to the various characteristics of skin color distribution in different color models, and the MLP is also invested as a more efficient classifier with less parameters than other classifiers. But the input dimension and required parameters of MLP will be increased when using two color models in skin color detection, as a result, the increased parameters will cause the huge teaming time in MLP. In this paper, we propose a MLP based classifier with less parameters in two color models. The proposed partially connected MLP based on two color models can reduce the number of weights and improve the classification rate. Because the characteristic of different color model can be learned in different partial networks. As the experimental results, we obtained 91.8% classification rate when testing various images in RGB and CbCr models.

Fruit price prediction study using artificial intelligence (인공지능을 이용한 과일 가격 예측 모델 연구)

  • Im, Jin-mo;Kim, Weol-Youg;Byoun, Woo-Jin;Shin, Seung-Jung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.2
    • /
    • pp.197-204
    • /
    • 2018
  • One of the hottest issues in our 21st century is AI. Just as the automation of manual labor has been achieved through the Industrial Revolution in the agricultural society, the intelligence information society has come through the SW Revolution in the information society. With the advent of Google 'Alpha Go', the computer has learned and predicted its own machine learning, and now the time has come for the computer to surpass the human, even to the world of Baduk, in other words, the computer. Machine learning ML (machine learning) is a field of artificial intelligence. Machine learning ML (machine learning) is a field of artificial intelligence, which means that AI technology is developed to allow the computer to learn by itself. The time has come when computers are beyond human beings. Many companies use machine learning, for example, to keep learning images on Facebook, and then telling them who they are. We also used a neural network to build an efficient energy usage model for Google's data center optimization. As another example, Microsoft's real-time interpretation model is a more sophisticated translation model as the language-related input data increases through translation learning. As machine learning has been increasingly used in many fields, we have to jump into the AI industry to move forward in our 21st century society.

A Technical Analysis on Deep Learning based Image and Video Compression (딥 러닝 기반의 이미지와 비디오 압축 기술 분석)

  • Cho, Seunghyun;Kim, Younhee;Lim, Woong;Kim, Hui Yong;Choi, Jin Soo
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.383-394
    • /
    • 2018
  • In this paper, we investigate image and video compression techniques based on deep learning which are actively studied recently. The deep learning based image compression technique inputs an image to be compressed in the deep neural network and extracts the latent vector recurrently or all at once and encodes it. In order to increase the image compression efficiency, the neural network is learned so that the encoded latent vector can be expressed with fewer bits while the quality of the reconstructed image is enhanced. These techniques can produce images of superior quality, especially at low bit rates compared to conventional image compression techniques. On the other hand, deep learning based video compression technology takes an approach to improve performance of the coding tools employed for existing video codecs rather than directly input and process the video to be compressed. The deep neural network technologies introduced in this paper replace the in-loop filter of the latest video codec or are used as an additional post-processing filter to improve the compression efficiency by improving the quality of the reconstructed image. Likewise, deep neural network techniques applied to intra prediction and encoding are used together with the existing intra prediction tool to improve the compression efficiency by increasing the prediction accuracy or adding a new intra coding process.

Place Recognition Using Ensemble Learning of Mobile Multimodal Sensory Information (모바일 멀티모달 센서 정보의 앙상블 학습을 이용한 장소 인식)

  • Lee, Chung-Yeon;Lee, Beom-Jin;On, Kyoung-Woon;Ha, Jung-Woo;Kim, Hong-Il;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.64-69
    • /
    • 2015
  • Place awareness is an essential for location-based services that are widely provided to smartphone users. However, traditional GPS-based methods are only valid outdoors where the GPS signal is strong and also require symbolic place information of the physical location. In this paper, environmental sounds and images are used to recognize important aspects of each place. The proposed method extracts feature vectors from visual, auditory and location data recorded by a smartphone with built-in camera, microphone and GPS sensors modules. The heterogeneous feature vectors were then learned by an ensemble learning method that learns each group of feature vectors for each classifier respectively and votes to produce the highest weighted result. The proposed method is evaluated for place recognition using a data group of 3000 samples in six places and the experimental results show a remarkably improved recognition accuracy when using all kinds of sensory data comparing to results using data from a single sensor or audio-visual integrated data only.

A Study On User Experience Based Storydoing Operating Principles (사용자 경험 기반 스토리두잉의 작동원리에 관한 연구)

  • Shin, Dong-Hee;Kim, Hee-Kyung
    • Journal of Digital Contents Society
    • /
    • v.16 no.3
    • /
    • pp.425-436
    • /
    • 2015
  • Along with the spotlight of storytelling, storydoing has attracted public's attention as it has been utilized in various different areas. There are valued message by producer and story to back it up in the storydoing. Recipient will acknowledge the affordance encouraged by producer and confirm the message by practicing it. Finally producer will evaluate the practice process. Therefore, storydoing promotes the product of company, strengthens the brand image and delivers message and value through a previously mentioned cycle. Ultimately, storydoing is operated based on user's experience. In this study, based on the experience theory of John Dewey, in order to discover how the interactivity and continuity operate story doing, we conducted a study on the concept of storydoing, the national and international story doing status, the difference between storytelling and storydoing, the elements of story doing, the relationship with user experience, and the principle of operating story doing. As a result, we learned that story doing had the five elements of message, story, characters, action, and confirmation, and operated by the interaction and continuity between the producer and receptor. Thus, through this research to understand the nature of storydoing, we have identified new trends of the cultural industries and discovered the possibilities to expand the application scope of storydoing, which was currently applied by companies to promote their brand images, onto contents field. More importantly, the proposal of theoretical differences between storytelling and storydoing makes this report meaningful in terms of sociocultural, industrial and academic aspect.

Development of Learning Algorithm using Brain Modeling of Hippocampus for Face Recognition (얼굴인식을 위한 해마의 뇌모델링 학습 알고리즘 개발)

  • Oh, Sun-Moon;Kang, Dae-Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.5 s.305
    • /
    • pp.55-62
    • /
    • 2005
  • In this paper, we propose the face recognition system using HNMA(Hippocampal Neuron Modeling Algorithm) which can remodel the cerebral cortex and hippocampal neuron as a principle of a man's brain in engineering, then it can learn the feature-vector of the face images very fast and construct the optimized feature each image. The system is composed of two parts. One is feature-extraction and the other is teaming and recognition. In the feature extraction part, it can construct good-classified features applying PCA(Principal Component Analysis) and LDA(Linear Discriminants Analysis) in order. In the learning part, it cm table the features of the image data which are inputted according to the order of hippocampal neuron structure to reaction-pattern according to the adjustment of a good impression in the dentate gyrus region and remove the noise through the associate memory in the CA3 region. In the CA1 region receiving the information of the CA3, it can make long-term memory learned by neuron. Experiments confirm the each recognition rate, that are face changes, pose changes and low quality image. The experimental results show that we can compare a feature extraction and learning method proposed in this paper of any other methods, and we can confirm that the proposed method is superior to existing methods.

Development of the Hippocampal Learning Algorithm Using Associate Memory and Modulator of Neural Weight (연상기억과 뉴런 연결강도 모듈레이터를 이용한 해마 학습 알고리즘 개발)

  • Oh Sun-Moon;Kang Dae-Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.4 s.310
    • /
    • pp.37-45
    • /
    • 2006
  • In this paper, we propose the development of MHLA(Modulatory Hippocampus Learning Algorithm) which remodel a principle of brain of hippocampus. Hippocampus takes charge auto-associative memory and controlling functions of long-term or short-term memory strengthening. We organize auto-associative memory based 3 steps system(DG, CA3, CAl) and improve speed of learning by addition of modulator to long-term memory learning. In hippocampal system, according to the 3 steps order, information applies statistical deviation on Dentate Gyrus region and is labelled to responsive pattern by adjustment of a good impression. In CA3 region, pattern is reorganized by auto-associative memory. In CAI region, convergence of connection weight which is used long-term memory is learned fast by neural networks which is applied modulator. To measure performance of MHLA, PCA(Principal Component Analysis) is applied to face images which are classified by pose, expression and picture quality. Next, we calculate feature vectors and learn by MHLA. Finally, we confirm cognitive rate. The results of experiments, we can compare a proposed method of other methods, and we can confirm that the proposed method is superior to the existing method.

A Bio-Edutainment System to Virus-Vaccine Discovery based on Collaborative Molecular in Real-Time with VR

  • Park, Sung-Jun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.6
    • /
    • pp.109-117
    • /
    • 2020
  • An edutainment system aims to help learners to recognize problems effectively, grasp and classify important information needed to solve the problems and convey the contents of what they have learned. Edutainment contents can be usefully applied to education and training in the both scientific and industrial areas. Our present work proposes an edutainment system that can be applied to a drug discovery process including virtual screening by using intuitive multi-modal interfaces. In this system, a stereoscopic monitor is used to make three-dimensional (3D) macro-molecular images, with supporting multi-modal interfaces to manipulate 3D models of molecular structures effectively. In this paper, our system can easily solve a docking simulation function, which is one of important virtual drug screening methods, by applying gaming factors. The level-up concept is implemented to realize a bio-game approach, in which the gaming factor depends on number of objects and users. The quality of the proposed system is evaluated with performance comparison in terms of a finishing time of a drug docking process to screen new inhibitors against target proteins of human immunodeficiency virus (HIV) in an e-drug discovery process.

Pedestrian Classification using CNN's Deep Features and Transfer Learning (CNN의 깊은 특징과 전이학습을 사용한 보행자 분류)

  • Chung, Soyoung;Chung, Min Gyo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.91-102
    • /
    • 2019
  • In autonomous driving systems, the ability to classify pedestrians in images captured by cameras is very important for pedestrian safety. In the past, after extracting features of pedestrians with HOG(Histogram of Oriented Gradients) or SIFT(Scale-Invariant Feature Transform), people classified them using SVM(Support Vector Machine). However, extracting pedestrian characteristics in such a handcrafted manner has many limitations. Therefore, this paper proposes a method to classify pedestrians reliably and effectively using CNN's(Convolutional Neural Network) deep features and transfer learning. We have experimented with both the fixed feature extractor and the fine-tuning methods, which are two representative transfer learning techniques. Particularly, in the fine-tuning method, we have added a new scheme, called M-Fine(Modified Fine-tuning), which divideslayers into transferred parts and non-transferred parts in three different sizes, and adjusts weights only for layers belonging to non-transferred parts. Experiments on INRIA Person data set with five CNN models(VGGNet, DenseNet, Inception V3, Xception, and MobileNet) showed that CNN's deep features perform better than handcrafted features such as HOG and SIFT, and that the accuracy of Xception (threshold = 0.5) isthe highest at 99.61%. MobileNet, which achieved similar performance to Xception and learned 80% fewer parameters, was the best in terms of efficiency. Among the three transfer learning schemes tested above, the performance of the fine-tuning method was the best. The performance of the M-Fine method was comparable to or slightly lower than that of the fine-tuningmethod, but higher than that of the fixed feature extractor method.