• Title/Summary/Keyword: Convolutional LSTM

Search Result 122, Processing Time 0.02 seconds

Short-Term Precipitation Forecasting based on Deep Neural Network with Synthetic Weather Radar Data (기상레이더 강수 합성데이터를 활용한 심층신경망 기반 초단기 강수예측 기술 연구)

  • An, Sojung;Choi, Youn;Son, MyoungJae;Kim, Kwang-Ho;Jung, Sung-Hwa;Park, Young-Youn
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.43-45
    • /
    • 2021
  • The short-term quantitative precipitation prediction (QPF) system is important socially and economically to prevent damage from severe weather. Recently, many studies for short-term QPF model applying the Deep Neural Network (DNN) has been conducted. These studies require the sophisticated pre-processing because the mistreatment of various and vast meteorological data sets leads to lower performance of QPF. Especially, for more accurate prediction of the non-linear trends in precipitation, the dataset needs to be carefully handled based on the physical and dynamical understands the data. Thereby, this paper proposes the following approaches: i) refining and combining major factors (weather radar, terrain, air temperature, and so on) related to precipitation development in order to construct training data for pattern analysis of precipitation; ii) producing predicted precipitation fields based on Convolutional with ConvLSTM. The proposed algorithm was evaluated by rainfall events in 2020. It is outperformed in the magnitude and strength of precipitation, and clearly predicted non-linear pattern of precipitation. The algorithm can be useful as a forecasting tool for preventing severe weather.

  • PDF

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.