• Title/Summary/Keyword: 입력처리 지도

Search Result 1,640, Processing Time 0.03 seconds

Performance Improvement Method of Fully Connected Neural Network Using Combined Parametric Activation Functions (결합된 파라메트릭 활성함수를 이용한 완전연결신경망의 성능 향상)

  • Ko, Young Min;Li, Peng Hang;Ko, Sun Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.1-10
    • /
    • 2022
  • Deep neural networks are widely used to solve various problems. In a fully connected neural network, the nonlinear activation function is a function that nonlinearly transforms the input value and outputs it. The nonlinear activation function plays an important role in solving the nonlinear problem, and various nonlinear activation functions have been studied. In this study, we propose a combined parametric activation function that can improve the performance of a fully connected neural network. Combined parametric activation functions can be created by simply adding parametric activation functions. The parametric activation function is a function that can be optimized in the direction of minimizing the loss function by applying a parameter that converts the scale and location of the activation function according to the input data. By combining the parametric activation functions, more diverse nonlinear intervals can be created, and the parameters of the parametric activation functions can be optimized in the direction of minimizing the loss function. The performance of the combined parametric activation function was tested through the MNIST classification problem and the Fashion MNIST classification problem, and as a result, it was confirmed that it has better performance than the existing nonlinear activation function and parametric activation function.

Deep Learning Based Group Synchronization for Networked Immersive Interactions (네트워크 환경에서의 몰입형 상호작용을 위한 딥러닝 기반 그룹 동기화 기법)

  • Lee, Joong-Jae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.373-380
    • /
    • 2022
  • This paper presents a deep learning based group synchronization that supports networked immersive interactions between remote users. The goal of group synchronization is to enable all participants to synchronously interact with others for increasing user presence Most previous methods focus on NTP-based clock synchronization to enhance time accuracy. Moving average filters are used to control media playout time on the synchronization server. As an example, the exponentially weighted moving average(EWMA) would be able to track and estimate accurate playout time if the changes in input data are not significant. However it needs more time to be stable for any given change over time due to codec and system loads or fluctuations in network status. To tackle this problem, this work proposes the Deep Group Synchronization(DeepGroupSync), a group synchronization based on deep learning that models important features from the data. This model consists of two Gated Recurrent Unit(GRU) layers and one fully-connected layer, which predicts an optimal playout time by utilizing the sequential playout delays. The experiments are conducted with an existing method that uses the EWMA and the proposed method that uses the DeepGroupSync. The results show that the proposed method are more robust against unpredictable or rapid network condition changes than the existing method.

Single Image Super Resolution Based on Residual Dense Channel Attention Block-RecursiveSRNet (잔여 밀집 및 채널 집중 기법을 갖는 재귀적 경량 네트워크 기반의 단일 이미지 초해상도 기법)

  • Woo, Hee-Jo;Sim, Ji-Woo;Kim, Eung-Tae
    • Journal of Broadcast Engineering
    • /
    • v.26 no.4
    • /
    • pp.429-440
    • /
    • 2021
  • With the recent development of deep convolutional neural network learning, deep learning techniques applied to single image super-resolution are showing good results. One of the existing deep learning-based super-resolution techniques is RDN(Residual Dense Network), in which the initial feature information is transmitted to the last layer using residual dense blocks, and subsequent layers are restored using input information of previous layers. However, if all hierarchical features are connected and learned and a large number of residual dense blocks are stacked, despite good performance, a large number of parameters and huge computational load are needed, so it takes a lot of time to learn a network and a slow processing speed, and it is not applicable to a mobile system. In this paper, we use the residual dense structure, which is a continuous memory structure that reuses previous information, and the residual dense channel attention block using the channel attention method that determines the importance according to the feature map of the image. We propose a method that can increase the depth to obtain a large receptive field and maintain a concise model at the same time. As a result of the experiment, the proposed network obtained PSNR as low as 0.205dB on average at 4× magnification compared to RDN, but about 1.8 times faster processing speed, about 10 times less number of parameters and about 1.74 times less computation.

Artificial Intelligence for Assistance of Facial Expression Practice Using Emotion Classification (감정 분류를 이용한 표정 연습 보조 인공지능)

  • Dong-Kyu, Kim;So Hwa, Lee;Jae Hwan, Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1137-1144
    • /
    • 2022
  • In this study, an artificial intelligence(AI) was developed to help with facial expression practice in order to express emotions. The developed AI used multimodal inputs consisting of sentences and facial images for deep neural networks (DNNs). The DNNs calculated similarities between the emotions predicted by the sentences and the emotions predicted by facial images. The user practiced facial expressions based on the situation given by sentences, and the AI provided the user with numerical feedback based on the similarity between the emotion predicted by sentence and the emotion predicted by facial expression. ResNet34 structure was trained on FER2013 public data to predict emotions from facial images. To predict emotions in sentences, KoBERT model was trained in transfer learning manner using the conversational speech dataset for emotion classification opened to the public by AIHub. The DNN that predicts emotions from the facial images demonstrated 65% accuracy, which is comparable to human emotional classification ability. The DNN that predicts emotions from the sentences achieved 90% accuracy. The performance of the developed AI was evaluated through experiments with changing facial expressions in which an ordinary person was participated.

Learning Data Model Definition and Machine Learning Analysis for Data-Based Li-Ion Battery Performance Prediction (데이터 기반 리튬 이온 배터리 성능 예측을 위한 학습 데이터 모델 정의 및 기계학습 분석 )

  • Byoungwook Kim;Ji Su Park;Hong-Jun Jang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.133-140
    • /
    • 2023
  • The performance of lithium ion batteries depends on the usage environment and the combination ratio of cathode materials. In order to develop a high-performance lithium-ion battery, it is necessary to manufacture the battery and measure its performance while varying the cathode material ratio. However, it takes a lot of time and money to directly develop batteries and measure their performance for all combinations of variables. Therefore, research to predict the performance of a battery using an artificial intelligence model has been actively conducted. However, since measurement experiments were conducted with the same battery in the existing published battery data, the cathode material combination ratio was fixed and was not included as a data attribute. In this paper, we define a training data model required to develop an artificial intelligence model that can predict battery performance according to the combination ratio of cathode materials. We analyzed the factors that can affect the performance of lithium-ion batteries and defined the mass of each cathode material and battery usage environment (cycle, current, temperature, time) as input data and the battery power and capacity as target data. In the battery data in different experimental environments, each battery data maintained a unique pattern, and the battery classification model showed that each battery was classified with an error of about 2%.

The Noise Robust Algorithm to Detect the Starting Point of Music for Content Based Music Retrieval System (노이즈에 강인한 음악 시작점 검출 알고리즘)

  • Kim, Jung-Soo;Sung, Bo-Kyung;Koo, Kwang-Hyo;Ko, Il-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.9
    • /
    • pp.95-104
    • /
    • 2009
  • This paper proposes the noise robust algorithm to detect the starting point of music. Detection of starting point of music is necessary to solve computational-waste problem and retrieval-comparison problem with inconsistent input data in music content based retrieval system. In particular, such detection is even more necessary in time sequential retrieval method that compares data in the sequential order of time in contents based music retrieval system. Whereas it has the long point that the retrieval is fast since it executes simple comparison in the order of time, time sequential retrieval method has the short point that data starting time to be compared should be the same. However, digitalized music cannot guarantee the equity of starting time by bit rate conversion. Therefore, this paper ensured that recognition rate shall not decrease even while executing high speed retrieval by applying time sequential retrieval method through detection of music starting point in the pre-processing stage of retrieval. Starting point detection used minimum wave model that can detect effective sound, and for strength against noise, the noises existing in mute sound were swapped. The proposed algorithm was confirmed to produce about 38% more excellent performance than the results to which starting point detection was not applied, and was verified for the strength against noise.

CKFont2: An Improved Few-Shot Hangul Font Generation Model Based on Hangul Composability (CKFont2: 한글 구성요소를 이용한 개선된 퓨샷 한글 폰트 생성 모델)

  • Jangkyoung, Park;Ammar, Ul Hassan;Jaeyoung, Choi
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.12
    • /
    • pp.499-508
    • /
    • 2022
  • A lot of research has been carried out on the Hangeul generation model using deep learning, and recently, research is being carried out how to minimize the number of characters input to generate one set of Hangul (Few-Shot Learning). In this paper, we propose a CKFont2 model using only 14 letters by analyzing and improving the CKFont (hereafter CKFont1) model using 28 letters. The CKFont2 model improves the performance of the CKFont1 model as a model that generates all Hangul using only 14 characters including 24 components (14 consonants and 10 vowels), where the CKFont1 model generates all Hangul by extracting 51 Hangul components from 28 characters. It uses the minimum number of characters for currently known models. From the basic consonants/vowels of Hangul, 27 components such as 5 double consonants, 11/11 compound consonants/vowels respectively are learned by deep learning and generated, and the generated 27 components are combined with 24 basic consonants/vowels. All Hangul characters are automatically generated from the combined 51 components. The superiority of the performance was verified by comparative analysis with results of the zi2zi, CKFont1, and MX-Font model. It is an efficient and effective model that has a simple structure and saves time and resources, and can be extended to Chinese, Thai, and Japanese.

Gear Fault Diagnosis Based on Residual Patterns of Current and Vibration Data by Collaborative Robot's Motions Using LSTM (LSTM을 이용한 협동 로봇 동작별 전류 및 진동 데이터 잔차 패턴 기반 기어 결함진단)

  • Baek Ji Hoon;Yoo Dong Yeon;Lee Jung Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.445-454
    • /
    • 2023
  • Recently, various fault diagnosis studies are being conducted utilizing data from collaborative robots. Existing studies performing fault diagnosis on collaborative robots use static data collected based on the assumed operation of predefined devices. Therefore, the fault diagnosis model has a limitation of increasing dependency on the learned data patterns. Additionally, there is a limitation in that a diagnosis reflecting the characteristics of collaborative robots operating with multiple joints could not be conducted due to experiments using a single motor. This paper proposes an LSTM diagnostic model that can overcome these two limitations. The proposed method selects representative normal patterns using the correlation analysis of vibration and current data in single-axis and multi-axis work environments, and generates residual patterns through differences from the normal representative patterns. An LSTM model that can perform gear wear diagnosis for each axis is created using the generated residual patterns as inputs. This fault diagnosis model can not only reduce the dependence on the model's learning data patterns through representative patterns for each operation, but also diagnose faults occurring during multi-axis operation. Finally, reflecting both internal and external data characteristics, the fault diagnosis performance was improved, showing a high diagnostic performance of 98.57%.

Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images (멀티-뷰 영상들을 활용하는 3차원 의미적 분할을 위한 효과적인 멀티-모달 특징 융합)

  • Hye-Lim Bae;Incheol Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.505-518
    • /
    • 2023
  • 3D point cloud semantic segmentation is a computer vision task that involves dividing the point cloud into different objects and regions by predicting the class label of each point. Existing 3D semantic segmentation models have some limitations in performing sufficient fusion of multi-modal features while ensuring both characteristics of 2D visual features extracted from RGB images and 3D geometric features extracted from point cloud. Therefore, in this paper, we propose MMCA-Net, a novel 3D semantic segmentation model using 2D-3D multi-modal features. The proposed model effectively fuses two heterogeneous 2D visual features and 3D geometric features by using an intermediate fusion strategy and a multi-modal cross attention-based fusion operation. Also, the proposed model extracts context-rich 3D geometric features from input point cloud consisting of irregularly distributed points by adopting PTv2 as 3D geometric encoder. In this paper, we conducted both quantitative and qualitative experiments with the benchmark dataset, ScanNetv2 in order to analyze the performance of the proposed model. In terms of the metric mIoU, the proposed model showed a 9.2% performance improvement over the PTv2 model using only 3D geometric features, and a 12.12% performance improvement over the MVPNet model using 2D-3D multi-modal features. As a result, we proved the effectiveness and usefulness of the proposed model.

Development of Five Finger type Myoelectric Hand Prosthesis for State Transition-Based Multi-Hand Gestures change (다중 손동작 변환을 위한 상태 전이 기반 5손가락 근전전동의수 개발)

  • Seung-Gi Kim;Sung-Yoon Jung;Beom-ki Hong;Hyun-Jun Shin;Kyoung-Ho Kim;Se-Hoon Park
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.25 no.2
    • /
    • pp.67-76
    • /
    • 2024
  • Various types of assistive devices have been developed for upper limb amputees over the years, with myoelectric prosthesis particularly aimed at improving user convenience by enabling a range of hand gestures beyond simple grasping, tailored to the size and shape of objects. In this study, we developed a five-finger myoelectric prosthesis mimicking human hand size and finger movements, utilizing motor and worm gear mechanisms for stable and independent operation. Based on this, we designed a control system for independent finger control through electromyographic signal input, proposed a state transition-based hand gesture conversion algorithm by selecting representative eight hand gestures and defining conversion condition parameters. We introduced training and usability evaluation methods, and conducted usability assessments among upper limb amputees using dedicated tools, confirming the potential for commercial application of the algorithm and observing adaptive capabilities and high performance through iterative evaluations.