• Title/Summary/Keyword: Pooling Algorithm

Search Result 33, Processing Time 0.026 seconds

Improving Test Accuracy on the MNIST Dataset using a Simple CNN with Batch Normalization

  • Seungbin Lee;Jungsoo Rhee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.1-7
    • /
    • 2024
  • In this paper, we proposes a Convolutional Neural Networks(CNN) equipped with Batch Normalization(BN) for handwritten digit recognition training the MNIST dataset. Aiming to surpass the performance of LeNet-5 by LeCun et al., a 6-layer neural network was designed. The proposed model processes 28×28 pixel images through convolution, Max Pooling, and Fully connected layers, with the batch normalization to improve learning stability and performance. The experiment utilized 60,000 training images and 10,000 test images, applying the Momentum optimization algorithm. The model configuration used 30 filters with a 5×5 filter size, padding 0, stride 1, and ReLU as activation function. The training process was set with a mini-batch size of 100, 20 epochs in total, and a learning rate of 0.1. As a result, the proposed model achieved a test accuracy of 99.22%, surpassing LeNet-5's 99.05%, and recorded an F1-score of 0.9919, demonstrating the model's performance. Moreover, the 6-layer model proposed in this paper emphasizes model efficiency with a simpler structure compared to LeCun et al.'s LeNet-5 (7-layer model) and the model proposed by Ji, Chun and Kim (10-layer model). The results of this study show potential for application in real industrial applications such as AI vision inspection systems. It is expected to be effectively applied in smart factories, particularly in determining the defective status of parts.

Multidimensional scaling of categorical data using the partition method (분할법을 활용한 범주형자료의 다차원척도법)

  • Shin, Sang Min;Chun, Sun-Kyung;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.67-75
    • /
    • 2018
  • Multidimensional scaling (MDS) is an exploratory analysis of multivariate data to represent the dissimilarity among objects in the geometric low-dimensional space. However, a general MDS map only shows the information of objects without any information about variables. In this study, we used MDS based on the algorithm of Torgerson (Theory and Methods of Scaling, Wiley, 1958) to visualize some clusters of objects in categorical data. For this, we convert given data into a multiple indicator matrix. Additionally, we added the information of levels for each categorical variable on the MDS map by applying the partition method of Shin et al. (Korean Journal of Applied Statistics, 28, 1171-1180, 2015). Therefore, we can find information on the similarity among objects as well as find associations among categorical variables using the proposed MDS map.

A scene search method based on principal character identification using convolutional neural network (컨볼루셔널 뉴럴 네트워크를 이용한 주인공 식별 기반의 영상장면 탐색 기법)

  • Kwon, Myung-Kyu;Yang, Hyeong-Sik
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.2
    • /
    • pp.31-36
    • /
    • 2017
  • In this paper, we try to search and reproduce the image part of a specific cast from a large number of images. The conventional method must manually set the offset value when searching for a scene or viewing a corner. However, in this paper, the proposed method learns the main character 's face, then finds the main character in the image recognition and moves to the scene where the main character appears to reproduce the image. Data for specific performers is extracted and collected using crawl techniques. Based on the collected data, we learn using convolutional neural network algorithm and perform performance evaluation using it. The performance evaluation measures the accuracy by extracting and judging a specific performer learned in the extracted key frame while playing the drama. The performance confirmation of how quickly and accurately the learned scene is searched has obtained about 93% accuracy. Based on the derived performance, it is applied to the image service such as viewing, searching for person and detailed information retrieval per corner

A fully deep learning model for the automatic identification of cephalometric landmarks

  • Kim, Young Hyun;Lee, Chena;Ha, Eun-Gyu;Choi, Yoon Jeong;Han, Sang-Sun
    • Imaging Science in Dentistry
    • /
    • v.51 no.3
    • /
    • pp.299-306
    • /
    • 2021
  • Purpose: This study aimed to propose a fully automatic landmark identification model based on a deep learning algorithm using real clinical data and to verify its accuracy considering inter-examiner variability. Materials and Methods: In total, 950 lateral cephalometric images from Yonsei Dental Hospital were used. Two calibrated examiners manually identified the 13 most important landmarks to set as references. The proposed deep learning model has a 2-step structure-a region of interest machine and a detection machine-each consisting of 8 convolution layers, 5 pooling layers, and 2 fully connected layers. The distance errors of detection between 2 examiners were used as a clinically acceptable range for performance evaluation. Results: The 13 landmarks were automatically detected using the proposed model. Inter-examiner agreement for all landmarks indicated excellent reliability based on the 95% confidence interval. The average clinically acceptable range for all 13 landmarks was 1.24 mm. The mean radial error between the reference values assigned by 1 expert and the proposed model was 1.84 mm, exhibiting a successful detection rate of 36.1%. The A-point, the incisal tip of the maxillary and mandibular incisors, and ANS showed lower mean radial error than the calibrated expert variability. Conclusion: This experiment demonstrated that the proposed deep learning model can perform fully automatic identification of cephalometric landmarks and achieve better results than examiners for some landmarks. It is meaningful to consider between-examiner variability for clinical applicability when evaluating the performance of deep learning methods in cephalometric landmark identification.

Optimization of Pose Estimation Model based on Genetic Algorithms for Anomaly Detection in Unmanned Stores (무인점포 이상행동 인식을 위한 유전 알고리즘 기반 자세 추정 모델 최적화)

  • Sang-Hyeop Lee;Jang-Sik Park
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.26 no.1
    • /
    • pp.113-119
    • /
    • 2023
  • In this paper, we propose an optimization of a pose estimation deep learning model for recognition of abnormal behavior in unmanned stores using radio frequencies. The radio frequency use millimeter wave in the 30 GHz to 300 GHz band. Due to the short wavelength and strong straightness, it is a frequency with less grayness and less interference due to radio absorption on the object. A millimeter wave radar is used to solve the problem of personal information infringement that may occur in conventional CCTV image-based pose estimation. Deep learning-based pose estimation models generally use convolution neural networks. The convolution neural network is a combination of convolution layers and pooling layers of different types, and there are many cases of convolution filter size, number, and convolution operations, and more cases of combining components. Therefore, it is difficult to find the structure and components of the optimal posture estimation model for input data. Compared with conventional millimeter wave-based posture estimation studies, it is possible to explore the structure and components of the optimal posture estimation model for input data using genetic algorithms, and the performance of optimizing the proposed posture estimation model is excellent. Data are collected for actual unmanned stores, and point cloud data and three-dimensional keypoint information of Kinect Azure are collected using millimeter wave radar for collapse and property damage occurring in unmanned stores. As a result of the experiment, it was confirmed that the error was moored compared to the conventional posture estimation model.

Deep learning-based AI constitutive modeling for sandstone and mudstone under cyclic loading conditions

  • Luyuan Wu;Meng Li;Jianwei Zhang;Zifa Wang;Xiaohui Yang;Hanliang Bian
    • Geomechanics and Engineering
    • /
    • v.37 no.1
    • /
    • pp.49-64
    • /
    • 2024
  • Rocks undergoing repeated loading and unloading over an extended period, such as due to earthquakes, human excavation, and blasting, may result in the gradual accumulation of stress and deformation within the rock mass, eventually reaching an unstable state. In this study, a CNN-CCM is proposed to address the mechanical behavior. The structure and hyperparameters of CNN-CCM include Conv2D layers × 5; Max pooling2D layers × 4; Dense layers × 4; learning rate=0.001; Epoch=50; Batch size=64; Dropout=0.5. Training and validation data for deep learning include 71 rock samples and 122,152 data points. The AI Rock Constitutive Model learned by CNN-CCM can predict strain values(ε1) using Mass (M), Axial stress (σ1), Density (ρ), Cyclic number (N), Confining pressure (σ3), and Young's modulus (E). Five evaluation indicators R2, MAPE, RMSE, MSE, and MAE yield respective values of 0.929, 16.44%, 0.954, 0.913, and 0.542, illustrating good predictive performance and generalization ability of model. Finally, interpreting the AI Rock Constitutive Model using the SHAP explaining method reveals that feature importance follows the order N > M > σ1 > E > ρ > σ3.Positive SHAP values indicate positive effects on predicting strain ε1 for N, M, σ1, and σ3, while negative SHAP values have negative effects. For E, a positive value has a negative effect on predicting strain ε1, consistent with the influence patterns of conventional physical rock constitutive equations. The present study offers a novel approach to the investigation of the mechanical constitutive model of rocks under cyclic loading and unloading conditions.

A Study on Person Re-Identification System using Enhanced RNN (확장된 RNN을 활용한 사람재인식 시스템에 관한 연구)

  • Choi, Seok-Gyu;Xu, Wenjie
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.2
    • /
    • pp.15-23
    • /
    • 2017
  • The person Re-identification is the most challenging part of computer vision due to the significant changes in human pose and background clutter with occlusions. The picture from non-overlapping cameras enhance the difficulty to distinguish some person from the other. To reach a better performance match, most methods use feature selection and distance metrics separately to get discriminative representations and proper distance to describe the similarity between person and kind of ignoring some significant features. This situation has encouraged us to consider a novel method to deal with this problem. In this paper, we proposed an enhanced recurrent neural network with three-tier hierarchical network for person re-identification. Specifically, the proposed recurrent neural network (RNN) model contain an iterative expectation maximum (EM) algorithm and three-tier Hierarchical network to jointly learn both the discriminative features and metrics distance. The iterative EM algorithm can fully use of the feature extraction ability of convolutional neural network (CNN) which is in series before the RNN. By unsupervised learning, the EM framework can change the labels of the patches and train larger datasets. Through the three-tier hierarchical network, the convolutional neural network, recurrent network and pooling layer can jointly be a feature extractor to better train the network. The experimental result shows that comparing with other researchers' approaches in this field, this method also can get a competitive accuracy. The influence of different component of this method will be analyzed and evaluated in the future research.

New Hybrid Approach of CNN and RNN based on Encoder and Decoder (인코더와 디코더에 기반한 합성곱 신경망과 순환 신경망의 새로운 하이브리드 접근법)

  • Jongwoo Woo;Gunwoo Kim;Keunho Choi
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.129-143
    • /
    • 2023
  • In the era of big data, the field of artificial intelligence is showing remarkable growth, and in particular, the image classification learning methods by deep learning are becoming an important area. Various studies have been actively conducted to further improve the performance of CNNs, which have been widely used in image classification, among which a representative method is the Convolutional Recurrent Neural Network (CRNN) algorithm. The CRNN algorithm consists of a combination of CNN for image classification and RNNs for recognizing time series elements. However, since the inputs used in the RNN area of CRNN are the flatten values extracted by applying the convolution and pooling technique to the image, pixel values in the same phase in the image appear in different order. And this makes it difficult to properly learn the sequence of arrangements in the image intended by the RNN. Therefore, this study aims to improve image classification performance by proposing a novel hybrid method of CNN and RNN applying the concepts of encoder and decoder. In this study, the effectiveness of the new hybrid method was verified through various experiments. This study has academic implications in that it broadens the applicability of encoder and decoder concepts, and the proposed method has advantages in terms of model learning time and infrastructure construction costs as it does not significantly increase complexity compared to conventional hybrid methods. In addition, this study has practical implications in that it presents the possibility of improving the quality of services provided in various fields that require accurate image classification.

Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label (약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘)

  • Park, Chungho;Kim, Donghyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.414-423
    • /
    • 2020
  • In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.