• Title/Summary/Keyword: deep machine learning

Search Result 1,085, Processing Time 0.03 seconds

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Human Walking Detection and Background Noise Classification by Deep Neural Networks for Doppler Radars (사람 걸음 탐지 및 배경잡음 분류 처리를 위한 도플러 레이다용 딥뉴럴네트워크)

  • Kwon, Jihoon;Ha, Seoung-Jae;Kwak, Nojun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.7
    • /
    • pp.550-559
    • /
    • 2018
  • The effectiveness of deep neural networks (DNNs) for detection and classification of micro-Doppler signals generated by human walking and background noise sources is investigated. Previous research included a complex process for extracting meaningful features that directly affect classifier performance, and this feature extraction is based on experiences and statistical analysis. However, because a DNN gradually reconstructs and generates features through a process of passing layers in a network, the preprocess for feature extraction is not required. Therefore, binary classifiers and multiclass classifiers were designed and analyzed in which multilayer perceptrons (MLPs) and DNNs were applied, and the effectiveness of DNNs for recognizing micro-Doppler signals was demonstrated. Experimental results showed that, in the case of MLPs, the classification accuracies of the binary classifier and the multiclass classifier were 90.3% and 86.1%, respectively, for the test dataset. In the case of DNNs, the classification accuracies of the binary classifier and the multiclass classifier were 97.3% and 96.1%, respectively, for the test dataset.

Performance Evaluation of Recurrent Neural Network Algorithms for Recommendation System in E-commerce (전자상거래 추천시스템을 위한 순환신경망 알고리즘들의 성능평가)

  • Seo, Jihye;Yong, Hwan-Seung
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.440-445
    • /
    • 2017
  • Due to the advance of e-commerce systems, the number of people using online shopping and products has significantly increased. Therefore, the need for an accurate recommendation system is becoming increasingly more important. Recurrent neural network is a deep-learning algorithm that utilizes sequential information in training. In this paper, an evaluation is performed on the application of recurrent neural networks to recommendation systems. We evaluated three recurrent algorithms (RNN, LSTM and GRU) and three optimal algorithms(Adagrad, RMSProp and Adam) which are commonly used. In the experiments, we used the TensorFlow open source library produced by Google and e-commerce session data from RecSys Challenge 2015. The results using the optimal hyperparameters found in this study are compared with those of RecSys Challenge 2015 participants.

Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching (Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험)

  • Kim, Young Soo;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.19 no.1
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

Ensemble Method for Predicting Particulate Matter and Odor Intensity (미세먼지, 악취 농도 예측을 위한 앙상블 방법)

  • Lee, Jong-Yeong;Choi, Myoung Jin;Joo, Yeongin;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.203-210
    • /
    • 2019
  • Recently, a number of researchers have produced research and reports in order to forecast more exactly air quality such as particulate matter and odor. However, such research mainly focuses on the atmospheric diffusion models that have been used for the air quality prediction in environmental engineering area. Even though it has various merits, it has some limitation in that it uses very limited spatial attributes such as geographical attributes. Thus, we propose the new approach to forecast an air quality using a deep learning based ensemble model combining temporal and spatial predictor. The temporal predictor employs the RNN LSTM and the spatial predictor is based on the geographically weighted regression model. The ensemble model also uses the RNN LSTM that combines two models with stacking structure. The ensemble model is capable of inferring the air quality of the areas without air quality monitoring station, and even forecasting future air quality. We installed the IoT sensors measuring PM2.5, PM10, H2S, NH3, VOC at the 8 stations in Jeonju in order to gather air quality data. The numerical results showed that our new model has very exact prediction capability with comparison to the real measured data. It implies that the spatial attributes should be considered to more exact air quality prediction.

A Study on Text Mining Analysis of Presidential Maritime Concept in KOREA (텍스트마이닝을 이용한 한국 대통령의 해양관에 관한 연구)

  • Kim, Sung-Kuk;Lee, Tae-Hwee
    • Journal of Korea Port Economic Association
    • /
    • v.36 no.3
    • /
    • pp.39-54
    • /
    • 2020
  • In the presidential political system, the word of the president has great influence on the formation of national policy and the decision-making process. Policy priorities are determined according to the president's ideology and core values, and various policies are established and executed according to the priorities. Therefore, this paper analyzes the contents of the president's speech. Since the president's speech is a semantic datum, in order to analyze unstructured text, big data analysis is conducted through the methods of machine learning and deep learning. In this study, the president's speech at the "National Sea Day" commemoration was obtained 1996 onwards and analyzed using topic modeling. As a result of the analysis, all the presidents' speeches were delivered with a view of the ocean that was consistent with the direction of their administration. It was confirmed that the ocean-industry-resource topics, which are the intrinsic values of the ocean, were not damaged and consistently emphasized by all presidents.

Estimation of Sweet Pepper Crop Fresh Weight with Convolutional Neural Network (합성곱 신경망을 이용한 온실 파프리카의 작물 생체중 추정)

  • Moon, Taewon;Park, Junyoung;Son, Jung Eek
    • Journal of Bio-Environment Control
    • /
    • v.29 no.4
    • /
    • pp.381-387
    • /
    • 2020
  • Various studies have been attempted to estimate and measure the fresh weight of crops. However, no studies have used raw images of sweet peppers to estimate fresh weight. Recently, image processing research using convolution neural network (CNN) that can use raw data is increasing. In this study, the crop fresh weight was estimated by using the images of sweet peppers as inputs of CNN. The experiment was performed in a greenhouse growing sweet pepper (Capsicum annuum L.). The fresh weight, the output of the CNN, was regressed based on the data collected through destructive investigation. The highest coefficient of determination (R2) of the trained CNN was 0.95. The estimated fresh weight showed a very similar trend to the actual measured value.

Particulate Matter Prediction using Multi-Layer Perceptron Network (다층 퍼셉트론 신경망을 이용한 미세먼지 예측)

  • Cho, Kyoung-woo;Jung, Yong-jin;Kang, Chul-gyu;Oh, Chang-heon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.620-622
    • /
    • 2018
  • The need for particulate matter prediction algorithms has increased as social interest in the effects of human on particulate matter increased. Many studies have proposed statistical modelling and machine learning techniques based prediction models using weather data, but it is difficult to accurately set the environment and detailed conditions of the models. In addition, there is a need to design a new prediction model for missing data in domestic weather monitoring station. In this paper, fine dust prediction is performed using multi-layer perceptron network as a previous study for particulate matter prediction. For this purpose, a prediction model is designed based on weather data of three monitoring station and the suitability of the algorithm for particulate matter prediction is evaluated through comparison with actual data.

  • PDF

Comparison of Based on Histogram Equalization Techniques by Using Normalization in Thoracic Computed Tomography (흉부 컴퓨터 단층 촬영에서 정규화를 사용한 다양한 히스토그램 평준화 기법을 비교)

  • Lee, Young-Jun;Min, Jung-Whan
    • Journal of radiological science and technology
    • /
    • v.44 no.5
    • /
    • pp.473-480
    • /
    • 2021
  • This study was purpose to method that applies for improving the image quality in CT and X-ray scan, especially in the lung region. Also, we researched the parameters of the image before and after applying for Histogram Equalization (HE) such as mean, median values in the histogram. These techniques are mainly used for all type of medical images such as for Chest X-ray, Low-Dose Computed Tomography (CT). These are also used to intensify tiny anatomies like vessels, lung nodules, airways and pulmonary fissures. The proposed techniques consist of two main steps using the MATLAB software (R2021a). First, the technique should apply for the process of normalization for improving the basic image more correctly. In the next, the technique actively rearranges the intensity of the image contrast. Second, the Contrast Limited Adaptive Histogram Equalization (CLAHE) method was used for enhancing small details, textures and local contrast of the image. As a result, this paper shows the modern and improved techniques of HE and some advantages of the technique on the traditional HE. Therefore, this paper concludes that various techniques related to the HE can be helpful for many processes, especially image pre-processing for Machine Learning (ML), Deep Learning (DL).

Technical Trading Rules for Bitcoin Futures (비트코인 선물의 기술적 거래 규칙)

  • Kim, Sun Woong
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.5
    • /
    • pp.94-103
    • /
    • 2021
  • This study aims to propose technical trading rules for Bitcoin futures and empirically analyze investment performance. Investment strategies include standard trading rules such as VMA, TRB, FR, MACD, RSI, BB, using Bitcoin futures daily data from December 18, 2017 to March 31, 2021. The trend-following rules showed higher investment performance than the comparative strategy B&H. Compared to KOSPI200 index futures, Bitcoin futures investment performance was higher. In particular, the investment performance has increased significantly in Sortino Ratio, which reflects downside risk. This study can find academic significance in that it is the first attempt to systematically analyze the investment performance of standard technical trading rules of Bitcoin futures. In future research, it is necessary to improve investment performance through the use of deep learning models or machine learning models to predict the price of Bitcoin futures.