• Title/Summary/Keyword: 로지스틱회귀

Search Result 1,750, Processing Time 0.032 seconds

Multi-currencies portfolio strategy using principal component analysis and logistic regression (주성분 분석과 로지스틱 회귀분석을 이용한 다국 통화포트폴리오 전략)

  • Shim, Kyung-Sik;Ahn, Jae-Joon;Oh, Kyong-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.151-159
    • /
    • 2012
  • This paper proposes to develop multi-currencies portfolio strategy using principal component analysis (PCA) and logistic regression (LR) in foreign exchange market. While there is a great deal of literature about the analysis of exchange market, there is relatively little work on developing trading strategies in foreign exchange markets. There are two objectives in this paper. The first objective is to suggest portfolio allocation method by applying PCA. The other objective is to determine market timing which is the strategy of making buy or sell decision using LR. The results of this study show that proposed model is useful trading strategy in foreign exchange market and can be desirable solution which gives lots of investors an important investment information.

Learning algorithms for big data logistic regression on RHIPE platform (RHIPE 플랫폼에서 빅데이터 로지스틱 회귀를 위한 학습 알고리즘)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.911-923
    • /
    • 2016
  • Machine learning becomes increasingly important in the big data era. Logistic regression is a type of classification in machine leaning, and has been widely used in various fields, including medicine, economics, marketing, and social sciences. Rhipe that integrates R and Hadoop environment, has not been discussed by many researchers owing to the difficulty of its installation and MapReduce implementation. In this paper, we present the MapReduce implementation of Gradient Descent algorithm and Newton-Raphson algorithm for logistic regression using Rhipe. The Newton-Raphson algorithm does not require a learning rate, while Gradient Descent algorithm needs to manually pick a learning rate. We choose the learning rate by performing the mixed procedure of grid search and binary search for processing big data efficiently. In the performance study, our Newton-Raphson algorithm outpeforms Gradient Descent algorithm in all the tested data.

Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model and Decision Tree Model (로지스틱 회귀모형과 의사결정나무 모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Bak, Su-Ho;Kim, Heung-Min;Kim, Bum-Kyu;Hwang, Do-Hyun;Unuzaya, Enkhjargal;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.4
    • /
    • pp.777-786
    • /
    • 2018
  • This study propose a new method to detect Cochlodinium polykrikoides on satellite images using logistic regression and decision tree. We used spectral profiles(918) extracted from red tide, clear water and turbid water as training data. The 70% of the entire data set was extracted and used for model training, and the classification accuracy of the model was evaluated by using the remaining 30%. As a result of the accuracy evaluation, the logistic regression model showed about 97% classification accuracy, and the decision tree model showed about 86% classification accuracy.

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.

Development of heavy rain damage prediction function using logistic regression model (로지스틱 회귀모형을 이용한 호우피해 예측함수 개발)

  • Choi, Chang Hyun;Kim, Jong Sung;Kim, Dong Hyun;Lee, Jong So;Kim, Hung Soo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2017.05a
    • /
    • pp.41-41
    • /
    • 2017
  • 자연재난으로 인한 피해의 대형화, 다양화, 집중화 현상이 일어나고 있으며, 이로 인한 사회 경제적 피해가 과거에 비해 계속적으로 증가하고 있다. 만약 기존에 발생하였던 재난 피해 자료와 기상현상간의 통계적 분석을 통해 재난의 발생 가능성과 피해 범위를 예측할 수 있다면, 효율적으로 재난관리를 할 수 있을 것이다. 따라서 본 연구에서는 대표적인 자연재난 피해인 호우피해를 대상으로 낙동강 권역 69개 시군구별 재해통계 자료를 기반으로 수문기상자료와의 통계적 분석을 통해 호우피해 예측함수를 개발하였다. 국민안전처에서 발간하는 재해연보 자료를 통해 호우피해 발생기간별 호우피해액 자료를 분석하였고, 이를 호우피해 예측함수의 종속변수로 사용하였다. 종관기상관측소의 시강우 자료를 분석하여 선행강우, 지속시간별 최대강우, 총강우량을 구축하였고, 시군구별 면적 등의 지역 특성을 수집하여 설명변수로 사용하였다. 기존의 피해예측함수 관련 연구에서 제기되었던 피해액이 큰 부분에서 예측력이 떨어지는 문제를 해결하기 위해, 피해액이 큰 집단과 피해액이 작은 집단을 구분하여 함수식을 개발할 수 있는 로지스틱 회귀모형을 사용하여 호우피해 예측함수를 개발하였다. 개발된 호우피해 예측함수의 NRMSE는 6.34~18.79%로 나타났으며, 대부분 호우피해를 적절하게 예측하는 것으로 나타났다. 본 연구에서는 호우피해액이 큰 집단과 피해액이 작은 집단으로 구분할 수 있는 로지스틱 회귀모형을 이용하여 낙동강 권역의 시군구별 호우피해 예측함수를 개발하였다. 본 연구에서 제시한 시군구별 호우피해 예측함수를 이용하여 사전에 호우피해를 예측할 수 있다면 호우피해액이 크게 줄어들 것으로 사료된다.

  • PDF

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.

Estimation of Logistic Regression for Two-Stage Case-Control Data (2단계 사례-대조자료를 위한 로지스틱 회귀모형의 추론)

  • 신미영;신은순
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.237-245
    • /
    • 2000
  • In this paper we consider a logistic regression model based on two-stage case-control sampling and study the Weighted Exogeneous Sampling Maximum Likelihood(WESML) method to get an asymptotically normal estimates of the parameters in a logistic regression model. A numerical example is carried out to demonstrate the differences between the Conditional Maximum Likelihood(CML) estimates and the WESML estimates for two-stage case-control data.

  • PDF

Prediction Model with a Logistic Regression of Sequencing Two Arrival Flows (합류하는 두 항공기간 도착순서 결정에 대한 로지스틱회귀 예측 모형)

  • Jung, Soyeon;Lee, Keumjin
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.23 no.4
    • /
    • pp.42-48
    • /
    • 2015
  • This paper has its purpose on constructing a prediction model of the arrival sequencing strategy which reflects the actual sequencing patterns of air traffic controllers. As the first step, we analyzed a pair-wise sequencing of two aircraft entering TMA from different entering points. Based on the historical trajectory data, several traffic factors such as time, speed and traffic density were examined for the model. With statistically significant factors, we constructed a prediction model of arrival sequencing through a binary logistic regression analysis. With the estimated coefficients, the performance of the model was conducted through a cross validation.