• Title/Summary/Keyword: 변수선별

Search Result 235, Processing Time 0.026 seconds

Variable selection in partial linear regression using the least angle regression (부분선형모형에서 LARS를 이용한 변수선택)

  • Seo, Han Son;Yoon, Min;Lee, Hakbae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.937-944
    • /
    • 2021
  • The problem of selecting variables is addressed in partial linear regression. Model selection for partial linear models is not easy since it involves nonparametric estimation such as smoothing parameter selection and estimation for linear explanatory variables. In this work, several approaches for variable selection are proposed using a fast forward selection algorithm, least angle regression (LARS). The proposed procedures use t-test, all possible regressions comparisons or stepwise selection process with variables selected by LARS. An example based on real data and a simulation study on the performance of the suggested procedures are presented.

Development of an Expert System to Improve the Methods of Parameter Estimation (매개변수 추정방법의 개선을 위한 전문가 시스템의 개발)

  • Lee, Beom-Hui;Lee, Gil-Seong
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.6
    • /
    • pp.641-655
    • /
    • 1998
  • The methods of development and application of an expert system are suggested to solve more efficiently the problems of water resources and quality induced by the rapid urbanization. Major parameters of the water quantity and quality of urban areas are selected their characteristics are presented by the sensitivity analysis. The rules to decide the parameters effectively are proposed based on these characteristics. the ESPE(Expert System for Parameter Estimation), an expert system based on the 'facts' and 'rules', is developed using the CLIPS 6.0 and applied to the basin of the An-Yang stream. The results of estimating t도 parameters of water quantity show a high applicability, but those of water quality imply the necessity of improving the present methods due to both the complexity of estimation processes and the lack of decision rules.

  • PDF

A Mathematical Model to Evaluate the Radiological Risks for the Reuse of Decommissioning Site (원자력시설 해체부지의 재이용을 위한 방사선학적 리스크 평가모델)

  • Cheong, Jae-Hak
    • Journal of Nuclear Fuel Cycle and Waste Technology(JNFCWT)
    • /
    • v.4 no.4
    • /
    • pp.353-363
    • /
    • 2006
  • In order to evaluate the potential radiological risks for the reuse of the site after decommissioning of nuclear facilities, a mathematical model was developed and materialized into the Microsoft $Excel{\circledR}$ spreadsheets frame. A set of input parameter values was proposed, which is useful in the preliminary risk screening step before the detailed evaluation with the site-specific data. It appeared that the screening levels calculated by the present model was agreed with the derived concentration guideline limits resulted from RESRAD Ver.6.2 and the German dose criteria for releasing a nuclear site from regulatory control.

  • PDF

Comparison of L, LH, LQ-moments and Parameter Estimation of GEV Distribution (L, LH, LQ-모멘트의 비교와 GEV 분포의 매개변수 추정)

  • Lee, Kil Seong;Jin, Lak Sun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2004.05b
    • /
    • pp.1137-1141
    • /
    • 2004
  • 본 연구에서는 Probability Weighted Moments의 새로운 선형조합기법인 LQ-moments를 이용하여 GEV 분포의 매개변수를 추정하고 L, LH, LQ-moments를 사용하여 뉴욕주의 Donnattsburg에 위치한 Independence River의 홍수량을 빈도 해석하였다. LH, LQ-moments가 제시된 근본적인 이유는 L-moments가 극치값에 내해 지나치게 민감한 단점을 보완하기 위해서인데, 이번 연구의 결과에 의하면 오히려 LH, LQ-moments가 극치값에 대해 민감하게 반응하여 부정확한 결과가 도출되었다. 그러므로 항상 LH, LQ-moments가 L-moments의 대안이 될 수 있는 것은 아님을 알게 되었다. 그리고 수학적 유도에서 L, LH, LQ-moments는 좀더 쉽고 간편한 메개변수 추정을 위해 Probability Weighted Moments의 선형조합을 통해 고안되었다는 공통점을 가지고 있지만, 이 점을 제외한 나머지 부분의 수식 유도에서는 서로 많은 차이가 있어서 지역적인 특성과 확률분포형의 특성을 고려하여 L, LH, LQ-moments 중에서 선별 사용해야 할 것이다.

  • PDF

Movie Box-office Prediction using Deep Learning and Feature Selection : Focusing on Multivariate Time Series

  • Byun, Jun-Hyung;Kim, Ji-Ho;Choi, Young-Jin;Lee, Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.6
    • /
    • pp.35-47
    • /
    • 2020
  • Box-office prediction is important to movie stakeholders. It is necessary to accurately predict box-office and select important variables. In this paper, we propose a multivariate time series classification and important variable selection method to improve accuracy of predicting the box-office. As a research method, we collected daily data from KOBIS and NAVER for South Korean movies, selected important variables using Random Forest and predicted multivariate time series using Deep Learning. Based on the Korean screen quota system, Deep Learning was used to compare the accuracy of box-office predictions on the 73rd day from movie release with the important variables and entire variables, and the results was tested whether they are statistically significant. As a Deep Learning model, Multi-Layer Perceptron, Fully Convolutional Neural Networks, and Residual Network were used. Among the Deep Learning models, the model using important variables and Residual Network had the highest prediction accuracy at 93%.

Identifying Consumer Response Factors in Live Commerce : Based on Consumer-Generated Text Data (라이브 커머스에서의 소비자 반응 요인 도출 : 소비자 생성 텍스트 데이터를 기반으로)

  • Park, Jae-Hyeong;Lee, Han-Sol;Kang, Ju-Young
    • Informatization Policy
    • /
    • v.30 no.2
    • /
    • pp.68-85
    • /
    • 2023
  • In this study, we collected data from live commerce streaming. Streamimg data were then categorized based on the degree of chatting activation, with the distribution of text responses generated by consumers analyzed. From a total of 2,282 streaming data on NAVER Shopping Live -which has the largest share in the domestic live commerce market- we selected 200 streaming data with the most active viewer responses and finally chose the streams that had steep increase or decrease in viewer responses. We synthesized variables from the existing literature on live commerce viewing intentions and participation motivations to create a table of variables for the purpose of the study. Then we applied them with events in the broadcast. Through this study, we identified which components of the broadcast stimulate the variables of consumer response found in previous studies, moreover, we empirically identified the motivations of consumers to participate in live commerce through data.

Research on Selecting Influential Climatic Factors and Optimal Timing Exploration for a Rice Production Forecast Model Using Weather Data

  • Jin-Kyeong Seo;Da-Jeong Choi;Juryon Paik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.57-65
    • /
    • 2023
  • Various studies to enhance the accuracy of rice production forecasting are focused on improving the accuracy of the models. In contrast, there is a relative lack of research regarding the data itself, which the prediction models are applied to. When applying the same dependent variable and prediction model to two different sets of rice production data composed of distinct features, discrepancies in results can occur. It is challenging to determine which dataset yields superior results under such circumstances. To address this issue, by identifying potential influential features within the data before applying the prediction model and centering the modeling around these, it is possible to achieve stable prediction results regardless of the composition of the data. In this study, we propose a method to adjust the composition of the data's features in order to select optimal base variables, aiding in achieving stable and consistent predictions for rice production. This method makes use of the Korea Meteorological Administration's ASOS data. The findings of this study are expected to make a substantial contribution towards enhancing the utility of performance evaluations in future research endeavors.

Real-time private consumption prediction using big data (빅데이터를 이용한 실시간 민간소비 예측)

  • Seung Jun Shin;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.13-38
    • /
    • 2024
  • As economic uncertainties have increased recently due to COVID-19, there is a growing need to quickly grasp private consumption trends that directly reflect the economic situation of private economic entities. This study proposes a method of estimating private consumption in real-time by comprehensively utilizing big data as well as existing macroeconomic indicators. In particular, it is intended to improve the accuracy of private consumption estimation by comparing and analyzing various machine learning methods that are capable of fitting ultra-high-dimensional big data. As a result of the empirical analysis, it has been demonstrated that when the number of covariates including big data is large, variables can be selected in advance and used for model fit to improve private consumption prediction performance. In addition, as the inclusion of big data greatly improves the predictive performance of private consumption after COVID-19, the benefit of big data that reflects new information in a timely manner has been shown to increase when economic uncertainty is high.

Estimation of storm events frequency analysis using copula function (Copula 함수를 이용한 호우사상의 빈도해석 산정)

  • An, Heejin;Lee, Moonyoung;Kim, Si Yeon;Jeon, Seol;Ahn, Youngmin;Jung, Donghwa;Park, Daeryong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.200-200
    • /
    • 2022
  • 본 연구에서는 총 강우량과 강우강도을 고려한 이변수 분석으로 연최대 호우사상을 선별하고, 두 변수를 Copula 함수로 결합하여 최적의 모델조합을 찾는 확률호우사상 산정 방법론을 제시하였다. 국내 69개 관측소의 2020년까지의 관측 자료를 대상으로 1mm 이하의 강우는 제거한 뒤, IETD(Inter-Event Time Definition) 12시간을 기준으로 강우자료를 독립적인 호우사상으로 분리하였다. 호우사상의 여러 특성 중 양의 상관관계를 갖는 총 강우량과 강우강도를 변수로 선택해 이변수 지수분포에 대입하였고, 각 지점의 연최대 호우사상 시계열을 생성하였다. 2변수 지수분포의 매개변수는 전체 기간과 연도별로 나누어 추정해 본 결과 연도별 변동성이 큰 것을 확인해 연도별 추정 방식을 선택하였다. 연최대 강우사상 시계열의 총 강우량과 강우강도는 극한 강우에 적용하는 확률분포형 중 Lognarmal, Gamma, Gumbel, GEV(Generalized Extreme Value), GPD(Generalized Pareto Distribution) 5가지를 사용하여 각각 CDF(Cumulative distribution Function) 값을 추정하였다. 계산된 CDF 값은 3가지 Copula 모형으로 결합해 joint CDF 값을 산출하였다. 총 75개의 모델조합 중 최적 모델을 찾기 위해 CVM(Cramer-von-Mises) 적합도 검정을 시행하였다. CVM의 통계량 Sn 값이 가장 작은 모델조합을 해당 지점의 최적 모델조합으로 선정하였다.

  • PDF

A Study on Optimization of Welding Process Variables in MIG Welding of Aluminum Alloy Sheets for automotive door (자동차 Door용 박판 알루미늄합금의 MIG 용접공정변수 최적화에 관한 연구)

  • Lee, Young-Gi;Han, Hyun-Uk;Kim, Jae-Seong;Lee, Bo-Young;Kim, Cheol-Hee
    • Proceedings of the KWS Conference
    • /
    • 2009.11a
    • /
    • pp.28-28
    • /
    • 2009
  • 최근 전세계적으로 유가 상승 및 배기가스 배출 저감과 관련된 각종 환경규제에 대응하기 위하여 선진 자동차회사들은 $CO_2$ 배출 저감 기술과 기존 내연기관 차량의 연비향상을 위해 엔진성능 개선, 구동시스템의 최적화, 차량 경량화, 공기저항 감소 등에 초점을 맞춰 차량의 연비향상과 배기가스 규제에 대응하고 있다. 특히, 자동차 중량의 30%를 차지하는 차체의 경량화는 엔진효율을 높여 자동차의 성능향상을 극대화시키고, 그로 인해 연비향상을 도모할 수 있으므로 환경오염 방지와 연료절감에 가장 적합하고 효과적인 방법이다. 이에 기존의 강재에 비해 비중이 낮으면서 유사한 강도와 내식성이 뛰어난 알루미늄 합금의 차체 적용에 대한 연구가 진행되고 있다. 본 연구에서는 자동차 Door에 알루미늄 합금(Al 5052)의 적용 가능성을 판단하기 위해 반응표면분석법(Response surface methodology)을 이용하여 저입열 Pulse MIG 용접 공정변수를 최적화하였다. 첫째, 저입열 Pulse MIG 용접에서 용접 공정 변수(용접전압, 용접속도, Gap)의 변화가 비드 형상에 미치는 영향에 대해 평가하였다. 요인분석법을 이용하여 용접 공정 변수와 비드 형상 변수와의 주효과와 교호작용효과를 분석하였고, 이를 통해 비드 형상 변수에 영향을 크게 미치는 용접 공정변수를 선별하여 다중회귀분석을 통해 용접 공정 변수 변화에 따른 비드형상 예측 회귀모델을 제안하였다. 둘째, 자동차 Door 생산 현장에서 박판 알루미늄 합금 겹치기 용접 이음부의 0~1 mm 갭 발생에 대해 강건한 용접조건을 제시하기 위해 반응표면법(Response surface methodology)을 이용하여 저입열 Pulse MIG 용접 공정 변수를 최적화하였고, 그 적용 가능성을 확인하였다.

  • PDF