• Title/Summary/Keyword: Big 5 Model

Search Result 444, Processing Time 0.025 seconds

Outliers and Level Shift Detection of the Mean-sea Level, Extreme Highest and Lowest Tide Level Data (평균 해수면 및 최극조위 자료의 이상자료 및 기준고도 변화(Level Shift) 진단)

  • Lee, Gi-Seop;Cho, Hong-Yeon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.32 no.5
    • /
    • pp.322-330
    • /
    • 2020
  • Modeling for outliers in time series was carried out using the MSL and extreme high, low tide levels (EHL, HLL) data set in the Busan and Mokpo stations. The time-series model is seasonal ARIMA model including the components of the AO (additive outliers) and LS (level shift). The optimal model was selected based on the AIC value and the model parameters were estimated using the 'tso' function (in 'tsoutliers' package of R). The main results by the model application, i.e.. outliers and level shift detections, are as follows. (1) The two AO are detected in the Busan monthly EHL data and the AO magnitudes were estimated to 65.5 cm (by typhoon MAEMI) and 29.5 cm (by typhoon SANBA), respectively. (2) The one level shift in 1983 is detected in Mokpo monthly MSL data, and the LS magnitude was estimated to 21.2 cm by the Youngsan River tidal estuary barrier construction. On the other hand, the RMS errors are computed about 1.95 cm (MSL), 5.11 cm (EHL), and 6.50 cm (ELL) in Busan station, and about 2.10 cm (MSL), 11.80 cm (EHL), and 9.14 cm (ELL) in Mokpo station, respectively.

MapReduce-based Localized Linear Regression for Electricity Price Forecasting (전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델)

  • Han, Jinju;Lee, Ingyu;On, Byung-Won
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.67 no.4
    • /
    • pp.183-190
    • /
    • 2018
  • Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.

An Open Source Mobile Cloud Service: Geo-spatial Image Filtering Tools Using R (오픈소스 모바일 클라우드 서비스: R 기반 공간영상정보 필터링 사례)

  • Kang, Sanggoo;Lee, Kiwon
    • Spatial Information Research
    • /
    • v.22 no.5
    • /
    • pp.1-8
    • /
    • 2014
  • Globally, mobile, cloud computing or big data are the recent marketable key terms. These trend technologies or paradigm in the ICT (Information Communication Technology) fields exert large influence on the most application fields including geo-spatial applications. Among them, cloud computing, though the early stage in Korea now, plays a important role as a platform for other trend technologies uses. Especially, mobile cloud, an integrated platform with mobile device and cloud computing can be considered as a good solution to overcome well known limitations of mobile applications and to provide more information processing functionalities to mobile users. This work is a case study to design and implement the mobile application system for geo-spatial image filtering processing operated on mobile cloud platform built using OpenStack and various open sources. Filtering processing is carried out using R environment, recently being recognized as one of big data analysis technologies. This approach is expected to be an element linking geo-spatial information for new service model development and the geo-spatial analysis service development using R.

A Study on the Document Topic Extraction System Based on Big Data (빅데이터 기반 문서 토픽 추출 시스템 연구)

  • Hwang, Seung-Yeon;An, Yoon-Bin;Shin, Dong-Jin;Oh, Jae-Kon;Moon, Jin Yong;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.207-214
    • /
    • 2020
  • Nowadays, the use of smart phones and various electronic devices is increasing, the Internet and SNS are activated, and we live in the flood of information. The amount of information has grown exponentially, making it difficult to look at a lot of information, and more and more people want to see only key keywords in a document, and the importance of research to extract topics that are the core of information is increasing. In addition, it is also an important issue to extract the topic and compare it with the past to infer the current trend. Topic modeling techniques can be used to extract topics from a large volume of documents, and these extracted topics can be used in various fields such as trend prediction and data analysis. In this paper, we inquire the topic of the three-year papers of 2016, 2017, and 2018 in the field of computing using the LDA algorithm, one of Probabilistic Topic Model Techniques, in order to analyze the rapidly changing trends and keep pace with the times. Then we analyze trends and flows of research.

Personality Traits and Response Styles (응답자의 성격특성과 응답스타일)

  • Kim, Seok-Ho;Shin, In-Cheol;Jeong, Jae-Ki
    • Survey Research
    • /
    • v.12 no.2
    • /
    • pp.51-76
    • /
    • 2011
  • Analyzing the 2009 Korean General Social Survey(KGSS), this study attempts to elucidate the mechanism how content-irrelevant response patterns are formed in the social survey. This study investigates the relationship between personality traits and response styles. Specifically, the effects of Big Five factors(extraversion, agreeableness, conscientiousness, emotional stability, openness to experience) of personality on the acquiescent response styles(ARS) and extreme response styles(ERS) are examined, controlling for individual characteristics and interview contexts. The results show that ERS is positively affected by extraversion, openness to experience, agreeableness, and conscientiousness, whereas ARS is not significantly associated with any dimension of personality traits. The implications of findings and the methods to reduce response bias are discussed.

  • PDF

The Study of Facebook Marketing Application Method: Facebook 'Likes' Feature and Predicting Demographic Information (페이스북 마케팅 활용 방안에 대한 연구: 페이스북 '좋아요' 기능과 인구통계학적 정보 추출)

  • Yu, Seong Jong;Ahn, Seun;Lee, Zoonky
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.61-66
    • /
    • 2016
  • With big data analysis, companies use the customized marketing strategy based on customer's information. However, because of the concerns about privacy issue and identity theft, people start erasing their personal information or changing the privacy settings on social network site. Facebook, the most used social networking site, has the feature called 'Likes' which can be used as a tool to predict user's demographic profiles, such as sex and age range. To make accurate analysis model for the study, 'Likes' data has been processed by using Gaussian RBF and nFactors for dimensionality reduction. With random Forest and 5-fold cross-validation, the result shows that sex has 75% and age has 97.85% accuracy rate. From this study, we expect to provide an useful guideline for companies and marketers who are suffering to collect customers' data.

  • PDF

Analysis of Factors Affecting Satisfaction with Commuting Time in the Era of Autonomous Driving (자율주행시대에 통근시간 만족도에 영향을 미치는 요인분석)

  • Jang, Jae-min;Cheon, Seung-hoon;Lee, Soong-bong
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.5
    • /
    • pp.172-185
    • /
    • 2021
  • As the era of autonomous driving approaches, it is expected to have a significant impact on our lives. When autonomous driving cars emerge, it is necessary to develop an index that can evaluate autonomous driving cars as it enhance the productive value of the car by reducing the burden on the driver. This study analyzed how the autonomous driving era affects commuting time and commuting time satisfaction among office goers using a car in Gyeonggi-do. First, a nonlinear relationship (V) was derived for the commuting time and commuting time satisfaction. Here, the factors affecting commuting time satisfaction were analyzed through a binomial logistic model, centered on the sample belonging to the nonlinear section (70 minutes or more for commuting time), which is likely to be affected by the autonomous driving era. The analysis results show that the variables affected by the autonomous driving era were health, sleeping hours, working hours, and leisure time. Since the emergence of autonomous driving cars is highly likely to improve the influencing variables, long-distance commuters are likely to feel higher commuting time satisfaction.

Method of Similarity Hash-Based Malware Family Classification (유사성 해시 기반 악성코드 유형 분류 기법)

  • Kim, Yun-jeong;Kim, Moon-sun;Lee, Man-hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.5
    • /
    • pp.945-954
    • /
    • 2022
  • Billions of malicious codes are detected every year, of which only 0.01% are new types of malware. In this situation, an effective malware type classification tool is needed, but previous studies have limitations in quickly analyzing a large amount of malicious code because it requires a complex and massive amount of data pre-processing. To solve this problem, this paper proposes a method to classify the types of malicious code based on the similarity hash without complex data preprocessing. This approach trains the XGBoost model based on the similarity hash information of the malware. To evaluate this approach, we used the BIG-15 dataset, which is widely used in the field of malware classification. As a result, the malicious code was classified with an accuracy of 98.9% also, identified 3,432 benign files with 100% accuracy. This result is superior to most recent studies using complex preprocessing and deep learning models. Therefore, it is expected that more efficient malware classification is possible using the proposed approach.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Fraud Detection System Model Using Generative Adversarial Networks and Deep Learning (생성적 적대 신경망과 딥러닝을 활용한 이상거래탐지 시스템 모형)

  • Ye Won Kim;Ye Lim Yu;Hong Yong Choi
    • Information Systems Review
    • /
    • v.22 no.1
    • /
    • pp.59-72
    • /
    • 2020
  • Artificial Intelligence is establishing itself as a familiar tool from an intractable concept. In this trend, financial sector is also looking to improve the problem of existing system which includes Fraud Detection System (FDS). It is being difficult to detect sophisticated cyber financial fraud using original rule-based FDS. This is because diversification of payment environment and increasing number of electronic financial transactions has been emerged. In order to overcome present FDS, this paper suggests 3 types of artificial intelligence models, Generative Adversarial Network (GAN), Deep Neural Network (DNN), and Convolutional Neural Network (CNN). GAN proves how data imbalance problem can be developed while DNN and CNN show how abnormal financial trading patterns can be precisely detected. In conclusion, among the experiments on this paper, WGAN has the highest improvement effects on data imbalance problem. DNN model reflects more effects on fraud classification comparatively.