• Title/Summary/Keyword: Time Series Data Analysis

Search Result 1,859, Processing Time 0.029 seconds

A Study on Commodity Asset Investment Model Based on Machine Learning Technique (기계학습을 활용한 상품자산 투자모델에 관한 연구)

  • Song, Jin Ho;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.127-146
    • /
    • 2017
  • Services using artificial intelligence have begun to emerge in daily life. Artificial intelligence is applied to products in consumer electronics and communications such as artificial intelligence refrigerators and speakers. In the financial sector, using Kensho's artificial intelligence technology, the process of the stock trading system in Goldman Sachs was improved. For example, two stock traders could handle the work of 600 stock traders and the analytical work for 15 people for 4weeks could be processed in 5 minutes. Especially, big data analysis through machine learning among artificial intelligence fields is actively applied throughout the financial industry. The stock market analysis and investment modeling through machine learning theory are also actively studied. The limits of linearity problem existing in financial time series studies are overcome by using machine learning theory such as artificial intelligence prediction model. The study of quantitative financial data based on the past stock market-related numerical data is widely performed using artificial intelligence to forecast future movements of stock price or indices. Various other studies have been conducted to predict the future direction of the market or the stock price of companies by learning based on a large amount of text data such as various news and comments related to the stock market. Investing on commodity asset, one of alternative assets, is usually used for enhancing the stability and safety of traditional stock and bond asset portfolio. There are relatively few researches on the investment model about commodity asset than mainstream assets like equity and bond. Recently machine learning techniques are widely applied on financial world, especially on stock and bond investment model and it makes better trading model on this field and makes the change on the whole financial area. In this study we made investment model using Support Vector Machine among the machine learning models. There are some researches on commodity asset focusing on the price prediction of the specific commodity but it is hard to find the researches about investment model of commodity as asset allocation using machine learning model. We propose a method of forecasting four major commodity indices, portfolio made of commodity futures, and individual commodity futures, using SVM model. The four major commodity indices are Goldman Sachs Commodity Index(GSCI), Dow Jones UBS Commodity Index(DJUI), Thomson Reuters/Core Commodity CRB Index(TRCI), and Rogers International Commodity Index(RI). We selected each two individual futures among three sectors as energy, agriculture, and metals that are actively traded on CME market and have enough liquidity. They are Crude Oil, Natural Gas, Corn, Wheat, Gold and Silver Futures. We made the equally weighted portfolio with six commodity futures for comparing with other commodity indices. We set the 19 macroeconomic indicators including stock market indices, exports & imports trade data, labor market data, and composite leading indicators as the input data of the model because commodity asset is very closely related with the macroeconomic activities. They are 14 US economic indicators, two Chinese economic indicators and two Korean economic indicators. Data period is from January 1990 to May 2017. We set the former 195 monthly data as training data and the latter 125 monthly data as test data. In this study, we verified that the performance of the equally weighted commodity futures portfolio rebalanced by the SVM model is better than that of other commodity indices. The prediction accuracy of the model for the commodity indices does not exceed 50% regardless of the SVM kernel function. On the other hand, the prediction accuracy of equally weighted commodity futures portfolio is 53%. The prediction accuracy of the individual commodity futures model is better than that of commodity indices model especially in agriculture and metal sectors. The individual commodity futures portfolio excluding the energy sector has outperformed the three sectors covered by individual commodity futures portfolio. In order to verify the validity of the model, it is judged that the analysis results should be similar despite variations in data period. So we also examined the odd numbered year data as training data and the even numbered year data as test data and we confirmed that the analysis results are similar. As a result, when we allocate commodity assets to traditional portfolio composed of stock, bond, and cash, we can get more effective investment performance not by investing commodity indices but by investing commodity futures. Especially we can get better performance by rebalanced commodity futures portfolio designed by SVM model.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

An Analysis of Statistical Characteristics of Nonlinear Ocean Waves (비선형 해양파의 통계적 특성에 대한 해석)

  • Kim, Do-Young
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.13 no.2
    • /
    • pp.112-120
    • /
    • 2010
  • In this paper time series wave data measured continuously for 24 hours during a storm in Yura Sea Area are used to investigate statistical characteristics of nonlinear waves. The exceedance probability of wave height is compared using the Rayleigh distribution and the Edgeworth-Rayleigh (ER) distribution. Wave data which show stationary state for 10 hours contain 4600 waves approximately. The Gram-Chalier distribution fits the probability of wave elevation better than the Gaussian distribution. The Rayleigh ($H_{rms}$) distribution follows the exceedance probability of wave height in general and predicts the probability of freak waves well. The ER distribution overpredicts the exceedance probability of wave heights and the occurrence of freak waves. If wave data measured for 30 minute period which contains 250 waves are used, the ER distribution can predict the occurrence probability of freak waves well. But it overpredicts the probability of overall wave height If no freak wave occurs, the Rayleigh ($H_{rms}$) distribution agrees well with wave height distribution for the most of wave height ranges. The wave height distribution of freak waves of which height are less than 10 m shows similar tendency compared with freak waves greater than 10 m. The value of $H_{max}/H_{1/3}$ is related to the kurtosis of wave elevation. It seems that there exists threshold value of the kurtosis for the occurrence of freak waves.

A Performance Analysis by Adjusting Learning Methods in Stock Price Prediction Model Using LSTM (LSTM을 이용한 주가예측 모델의 학습방법에 따른 성능분석)

  • Jung, Jongjin;Kim, Jiyeon
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.259-266
    • /
    • 2020
  • Many developments have been steadily carried out by researchers with applying knowledge-based expert system or machine learning algorithms to the financial field. In particular, it is now common to perform knowledge based system trading in using stock prices. Recently, deep learning technologies have been applied to real fields of stock trading marketplace as GPU performance and large scaled data have been supported enough. Especially, LSTM has been tried to apply to stock price prediction because of its compatibility for time series data. In this paper, we implement stock price prediction using LSTM. In modeling of LSTM, we propose a fitness combination of model parameters and activation functions for best performance. Specifically, we propose suitable selection methods of initializers of weights and bias, regularizers to avoid over-fitting, activation functions and optimization methods. We also compare model performances according to the different selections of the above important modeling considering factors on the real-world stock price data of global major companies. Finally, our experimental work brings a fitness method of applying LSTM model to stock price prediction.

Prediction and Causality Examination of the Environment Service Industry and Distribution Service Industry (환경서비스업과 물류서비스업의 예측 및 인과성 검정)

  • Sun, Il-Suck;Lee, Choong-Hyo
    • Journal of Distribution Science
    • /
    • v.12 no.6
    • /
    • pp.49-57
    • /
    • 2014
  • Purpose - The world now recognizes environmental disruption as a serious issue when regarding growth-oriented strategies; therefore, environmental preservation issues become pertinent. Consequently, green distribution is continuously emphasized. However, studying the prediction and association of distribution and the environment is insufficient. Most existing studies about green distribution are about its necessity, detailed operation methods, and political suggestions; it is necessary to study the distribution service industry and environmental service industry together, for green distribution. Research design, data, and methodology - ARIMA (auto-regressive moving average model) was used to predict the environmental service and distribution service industries, and the Granger Causality Test based on VAR (vector auto regressive) was used to analyze the causal relationship. This study used 48 quarters of time-series data, from the 4th quarter in 2001 to the 3rd quarter in 2013, about each business type's production index, and used an unchangeable index. The production index about the business type is classified into the current index and the unchangeable index. The unchangeable index divides the current index into deflators to remove fluctuation. Therefore, it is easy to analyze the actual production index. This study used the unchangeable index. Results - The production index of the distribution service industry and the production index of the environmental service industry consider the autocorrelation coefficient and partial autocorrelation coefficient; therefore, ARIMA(0,0,2)(0,1,1)4 and ARIMA(3,1,0)(0,1,1)4 were established as final prediction models, resulting in the gradual improvement in every production index of both types of business. Regarding the distribution service industry's production index, it is predicted that the 4th quarter in 2014 is 114.35, and the 4th quarter in 2015 is 123.48. Moreover, regarding the environmental service industry's production index, it is predicted that the 4th quarter in 2014 is 110.95, and the 4th quarter in 2015 is 111.67. In a causal relationship analysis, the environmental service industry impacts the distribution service industry, but the distribution service industry does not impact the environmental service industry. Conclusions - This study predicted the distribution service industry and environmental service industry with the ARIMA model, and examined the causal relationship between them through the Granger causality test based on the VAR Model. Prediction reveals the seasonality and gradual increase in the two industries. Moreover, the environmental service industry impacts the distribution service industry, but the distribution service industry does not impact the environmental service industry. This study contributed academically by offering base line data needed in the establishment of a future style of management and policy directions for the two industries through the prediction of the distribution service industry and the environmental service industry, and tested a causal relationship between them, which is insufficient in existing studies. The limitations of this study are that deeper considerations of advanced studies are deficient, and the effect of causality between the two types of industries on the actual industry was not established.

A Study on Resolving Barriers to Entry into the Resell Market by Exploring and Predicting Price Increases Using the XGBoost Model (XGBoost 모형을 활용한 가격 상승 요인 탐색 및 예측을 통한 리셀 시장 진입 장벽 해소에 관한 연구)

  • Yoon, HyunSeop;Kang, Juyoung
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.155-174
    • /
    • 2021
  • This study noted the emergence of the Resell investment within the fashion market, among emerging investment techniques. Worldwide, the market size is growing rapidly, and currently, there is a craze taking place throughout Korea. Therefore, we would like to use shoe data from StockX, the representative site of Resell, to present basic guidelines to consumers and to break down barriers to entry into the Resell market. Moreover, it showed the current status of the Resell craze, which was based on information from various media outlets, and then presented the current status and research model of the Resell market through prior research. Raw data was collected and analyzed using the XGBoost algorithm and the Prophet model. Analysis showed that the factors that affect the Resell market were identified, and the shoes suitable for the Resell market were also identified. Furthermore, historical data on shoes allowed us to predict future prices, thereby predicting future profitability. Through this study, the market will allow unfamiliar consumers to actively participate in the market with the given information. It also provides a variety of vital information regarding Resell investments, thus. forming a fundamental guideline for the market and further contributing to addressing entry barriers.

The Characteristics of Submarine Groundwater Discharge in the Coastal Area of Nakdong River Basin (낙동강 유역의 연안 해저지하수 유출특성에 관한 연구)

  • Kim, Daesun;Jung, Hahn Chul
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1589-1597
    • /
    • 2021
  • Submarine groundwater discharge (SGD) in coastal areas is gaining importance as a major transport route that bring nutrients and trace metals into the ocean. This paper describes the analysis of the seasonal changes and spatiotemporal characteristicsthrough the modeling monthly SGD for 35 years from 1986 to 2020 for the Nakdong river basin. In this study, we extracted 210 watersheds and SGD estimation points using the SRTM (Shuttle Radar Topography Mission) DEM (Digital Elevation Model). The average annual SGD of the Nakdong River basin was estimated to be 466.7 m2/yr from the FLDAS (Famine Early Warning Systems Network Land Data Assimilation System) recharge data of 10 km which is the highest resolution global model applicable to Korea. There was no significant time-series variation of SGD in the Nakdong river basin, but the concentrated period of SGD was expanded from summer to autumn. In addition, it was confirmed that there is a large amount of SGD regardless of the season in coastal area nearby large rivers, and the trend has slightly increased since the 1980s. The characteristics are considered to be related to the change in the major precipitation period in the study area, and spatially it is due to the high baseflow-groundwater in the vicinity of large rivers. This study is a precedentstudy that presents a modeling technique to explore the characteristics of SGD in Korea, and is expected to be useful as foundational information for coastal management and evaluating the impact of SGD to the ocean.

Prediction of Traffic Congestion in Seoul by Deep Neural Network (심층인공신경망(DNN)과 다각도 상황 정보 기반의 서울시 도로 링크별 교통 혼잡도 예측)

  • Kim, Dong Hyun;Hwang, Kee Yeon;Yoon, Young
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.18 no.4
    • /
    • pp.44-57
    • /
    • 2019
  • Various studies have been conducted to solve traffic congestions in many metropolitan cities through accurate traffic flow prediction. Most studies are based on the assumption that past traffic patterns repeat in the future. Models based on such an assumption fall short in case irregular traffic patterns abruptly occur. Instead, the approaches such as predicting traffic pattern through big data analytics and artificial intelligence have emerged. Specifically, deep learning algorithms such as RNN have been prevalent for tackling the problems of predicting temporal traffic flow as a time series. However, these algorithms do not perform well in terms of long-term prediction. In this paper, we take into account various external factors that may affect the traffic flows. We model the correlation between the multi-dimensional context information with temporal traffic speed pattern using deep neural networks. Our model trained with the traffic data from TOPIS system by Seoul, Korea can predict traffic speed on a specific date with the accuracy reaching nearly 90%. We expect that the accuracy can be improved further by taking into account additional factors such as accidents and constructions for the prediction.

An Empirical Study on the Cryptocurrency Investment Methodology Combining Deep Learning and Short-term Trading Strategies (딥러닝과 단기매매전략을 결합한 암호화폐 투자 방법론 실증 연구)

  • Yumin Lee;Minhyuk Lee
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.377-396
    • /
    • 2023
  • As the cryptocurrency market continues to grow, it has developed into a new financial market. The need for investment strategy research on the cryptocurrency market is also emerging. This study aims to conduct an empirical analysis on an investment methodology of cryptocurrency that combines short-term trading strategy and deep learning. Daily price data of the Ethereum was collected through the API of Upbit, the Korean cryptocurrency exchange. The investment performance of the experimental model was analyzed by finding the optimal parameters based on past data. The experimental model is a volatility breakout strategy(VBS), a Long Short Term Memory(LSTM) model, moving average cross strategy and a combined model. VBS is a short-term trading strategy that buys when volatility rises significantly on a daily basis and sells at the closing price of the day. LSTM is suitable for time series data among deep learning models, and the predicted closing price obtained through the prediction model was applied to the simple trading rule. The moving average cross strategy determines whether to buy or sell when the moving average crosses. The combined model is a trading rule made by using derived variables of the VBS and LSTM model using AND/OR for the buy conditions. The result shows that combined model is better investment performance than the single model. This study has academic significance in that it goes beyond simple deep learning-based cryptocurrency price prediction and improves investment performance by combining deep learning and short-term trading strategies, and has practical significance in that it shows the applicability in actual investment.

Development of Customer Sentiment Pattern Map for Webtoon Content Recommendation (웹툰 콘텐츠 추천을 위한 소비자 감성 패턴 맵 개발)

  • Lee, Junsik;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.67-88
    • /
    • 2019
  • Webtoon is a Korean-style digital comics platform that distributes comics content produced using the characteristic elements of the Internet in a form that can be consumed online. With the recent rapid growth of the webtoon industry and the exponential increase in the supply of webtoon content, the need for effective webtoon content recommendation measures is growing. Webtoons are digital content products that combine pictorial, literary and digital elements. Therefore, webtoons stimulate consumer sentiment by making readers have fun and engaging and empathizing with the situations in which webtoons are produced. In this context, it can be expected that the sentiment that webtoons evoke to consumers will serve as an important criterion for consumers' choice of webtoons. However, there is a lack of research to improve webtoons' recommendation performance by utilizing consumer sentiment. This study is aimed at developing consumer sentiment pattern maps that can support effective recommendations of webtoon content, focusing on consumer sentiments that have not been fully discussed previously. Metadata and consumer sentiments data were collected for 200 works serviced on the Korean webtoon platform 'Naver Webtoon' to conduct this study. 488 sentiment terms were collected for 127 works, excluding those that did not meet the purpose of the analysis. Next, similar or duplicate terms were combined or abstracted in accordance with the bottom-up approach. As a result, we have built webtoons specialized sentiment-index, which are reduced to a total of 63 emotive adjectives. By performing exploratory factor analysis on the constructed sentiment-index, we have derived three important dimensions for classifying webtoon types. The exploratory factor analysis was performed through the Principal Component Analysis (PCA) using varimax factor rotation. The three dimensions were named 'Immersion', 'Touch' and 'Irritant' respectively. Based on this, K-Means clustering was performed and the entire webtoons were classified into four types. Each type was named 'Snack', 'Drama', 'Irritant', and 'Romance'. For each type of webtoon, we wrote webtoon-sentiment 2-Mode network graphs and looked at the characteristics of the sentiment pattern appearing for each type. In addition, through profiling analysis, we were able to derive meaningful strategic implications for each type of webtoon. First, The 'Snack' cluster is a collection of webtoons that are fast-paced and highly entertaining. Many consumers are interested in these webtoons, but they don't rate them well. Also, consumers mostly use simple expressions of sentiment when talking about these webtoons. Webtoons belonging to 'Snack' are expected to appeal to modern people who want to consume content easily and quickly during short travel time, such as commuting time. Secondly, webtoons belonging to 'Drama' are expected to evoke realistic and everyday sentiments rather than exaggerated and light comic ones. When consumers talk about webtoons belonging to a 'Drama' cluster in online, they are found to express a variety of sentiments. It is appropriate to establish an OSMU(One source multi-use) strategy to extend these webtoons to other content such as movies and TV series. Third, the sentiment pattern map of 'Irritant' shows the sentiments that discourage customer interest by stimulating discomfort. Webtoons that evoke these sentiments are hard to get public attention. Artists should pay attention to these sentiments that cause inconvenience to consumers in creating webtoons. Finally, Webtoons belonging to 'Romance' do not evoke a variety of consumer sentiments, but they are interpreted as touching consumers. They are expected to be consumed as 'healing content' targeted at consumers with high levels of stress or mental fatigue in their lives. The results of this study are meaningful in that it identifies the applicability of consumer sentiment in the areas of recommendation and classification of webtoons, and provides guidelines to help members of webtoons' ecosystem better understand consumers and formulate strategies.