Search | Korea Science

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
- Journal of Korean Society for Quality Management
- /
- v.34 no.2
- /
- pp.129-135
- /
- 2006
In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.
PDF KSCI

Review of statistical methods for survival analysis using genomic data

Lee, Seungyeoun;Lim, Heeju
- Genomics & Informatics
- /
- v.17 no.4
- /
- pp.41.1-41.12
- /
- 2019
Survival analysis mainly deals with the time to event, including death, onset of disease, and bankruptcy. The common characteristic of survival analysis is that it contains "censored" data, in which the time to event cannot be completely observed, but instead represents the lower bound of the time to event. Only the occurrence of either time to event or censoring time is observed. Many traditional statistical methods have been effectively used for analyzing survival data with censored observations. However, with the development of high-throughput technologies for producing "omics" data, more advanced statistical methods, such as regularization, should be required to construct the predictive survival model with high-dimensional genomic data. Furthermore, machine learning approaches have been adapted for survival analysis, to fit nonlinear and complex interaction effects between predictors, and achieve more accurate prediction of individual survival probability. Presently, since most clinicians and medical researchers can easily assess statistical programs for analyzing survival data, a review article is helpful for understanding statistical methods used in survival analysis. We review traditional survival methods and regularization methods, with various penalty functions, for the analysis of high-dimensional genomics, and describe machine learning techniques that have been adapted to survival analysis.
https://doi.org/10.5808/GI.2019.17.4.e41 인용 PDF KSCI

Unstructured Data based a Study of Effectiveness about Prediction of Corporate Bankruptcy with a Real Case (실제 사례 기반 비정형 데이터를 활용한 기업의 부실징후 예측에 관한 효용성 연구)

JIN, Hoon;Hong, Jeoung-Pyo;Lee, Kang-Ho;Joo, Dong-Won
- Annual Conference on Human and Language Technology
- /
- 2018.10a
- /
- pp.487-492
- /
- 2018
4차산업 혁명의 여파로 국내에서는 다양한 분야에 인공지능과 빅데이터 기술을 활용하여 이전에 시행 중인 다양한 서비스 분야에 기술적 접목과 보완을 시도하고 있다. 특히 금융권에서 자금을 빌린 기업들을 대상으로 여신 안정성을 확보하고 선제적인 대응을 위해 온라인 뉴스기사들과 SNS 데이터 등을 이용하여 부실가능성을 예측하고 실제 업무에 도입하려는 시도들이 국내 주요 은행들을 중심으로 활발히 진행 중이다. 우리는 국내의 국책은행에서 수행한 비정형 데이터 기반의 기업의 부실징후 예측 시스템 개발 과정에서 시도된 다양한 분석 방법과 결과 그리고 과정 중에 발생한 문제점들에 관해 기술하고 관련 이슈들에 관하여 다룬다. 결과적으로 본 논문은 레이블이 없는 대량의 기사들에 레이블을 달기 위한 자동 태거(tagger) 개발과 뉴스 기사 예측 결과로부터 부실 가능성을 예측하기 위한 모델 및 성능 면에서 기사 예측 정확도 92%(AUC 0.96) 및 부실 가능성 기업 예측에서도 정형 데이터 분석결과에 견줄만한 성과를 이루었고 이에 관해 보고한다.
PDF

A Study on the Sustainability of New SMEs through the Analysis of Altman Z-Score: Focusing on New and Renewable Energy Industry in Korea (알트만 Z-스코어를 이용한 신생 중소기업의 지속가능성 분석: 신재생에너지산업을 중심으로)

Oh, Nak-Kyo;Yoon, Sung-Soo;Park, Won-Koo
- Journal of Technology Innovation
- /
- v.22 no.2
- /
- pp.185-220
- /
- 2014
The purpose of this study is to get a whole picture of financial conditions of the new and renewable energy sector which have been growing rapidly and predict bankruptcy risk quantitatively. There have been many researches on the methodologies for company failure prediction, such as financial ratios as predictors of failure, analysis of corporate governance, risk factors and survival analysis, and others. The research method for this study is Altman Z-score which has been widely used in the world. Data Set was composed of 121 companies with financial statements from KIS-Value. Covering period for the analysis of the data set is from the year 2006 to 2011. As a result of this study, we found that 38 percent of the data set belongs to "Distress" Zone (on alert) while 38% (on watch), summed into 76%, whose level could be interpreted to doubt about the sustainability. The average of the SMEs in wind energy sector was worse than that of SMEs in solar energy sector. And the average of the SMEs in the "Distress" Zone (on alert) was worse than that of the companies of large group in the "Distress" Zone (on alert). In conclusion, Altman Z-score was well proved to be effective for New & Renewable Energy Industry in Korea as a result of this study. The importance of this study lies on the result to demonstrate empirically that the majority of solar and wind enterprises are facing the risk of bankruptcy. And it is also meaningful to have studied the relationship between SMEs and large companies in addition to advancing research on new start-up companies.
https://doi.org/10.14383/SIME.2014.22.2.185 인용 PDF

A study on forecasting of consumers＇ choice using artificial neural network (인공신경망을 이용한 소비자 선택 예측에 관한 연구)

송수섭;이의훈
- Journal of the Korean Operations Research and Management Science Society
- /
- v.26 no.4
- /
- pp.55-70
- /
- 2001
Artificial neural network(ANN) models have been widely used for the classification problems in business such as bankruptcy prediction, credit evaluation, etc. Although the application of ANN to classification of consumers＇ choice behavior is a promising research area, there have been only a few researches. In general, most of the researches have reported that the classification performance of the ANN models were better than conventional statistical model Because the survey data on consumer behavior may include much noise and missing data, ANN model will be more robust than conventional statistical models welch need various assumptions. The purpose of this paper is to study the potential of the ANN model for forecasting consumers＇ choice behavior based on survey data. The data was collected by questionnaires to the shoppers of department stores and discount stores. Then the correct classification rates of the ANN models for the training and test sample with that of multiple discriminant analysis(MDA) and logistic regression(Logit) model. The performance of the ANN models were betted than the performance of the MDA and Logit model with respect to correct classification rate. By using input variables identified as significant in the stepwise MDA, the performance of the ANN models were improved.
PDF

Improving an Ensemble Model Using Instance Selection Method (사례 선택 기법을 활용한 앙상블 모형의 성능 개선)

Min, Sung-Hwan
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.39 no.1
- /
- pp.105-115
- /
- 2016
Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.
https://doi.org/10.11627/jkise.2016.39.1.105 인용 PDF KSCI

Prediction of bankruptcy data using machine learning techniques (기계학습 방법을 이용한 기업부도의 예측)

Park, Dong-Joon;Yun, Ye-Boon;Yoon, Min
- Journal of the Korean Data and Information Science Society
- /
- v.23 no.3
- /
- pp.569-577
- /
- 2012
The analysis and management of business failure has been recognized to be important in the area of financial management in the evaluation of firms' performance and the assessment of their viability. To this end, effective failure-prediction models are needed. This paper describes a new approach to prediction of business failure using the total margin algorithm which is a kind of support vector machine. It will be shown that the proposed method can evaluate the risk of failure better than existing methods through some real data.
https://doi.org/10.7465/jkdi.2012.23.3.569 인용 PDF KSCI

Impacts of Financial Distress and ICT on Operating Performance and Efficiency: Empirical Evidence from Commercial Banks in India

RAWAL, Aashi;RASTOGI, Shailesh;SHARMA, Rahul;RASTOGI, Samaksh
- The Journal of Asian Finance, Economics and Business
- /
- v.9 no.6
- /
- pp.105-114
- /
- 2022
With the help of this study, we aim to investigate the influence of Financial Distress (FD) and information and communication technology (ICT) on the operating performance and efficiency of banks in the Indian banking sector. FD can be defined as a position in which a company or individual is not in a condition to fulfill their promise of paying their obligations on time. The term "financial distress" refers to a situation in which a corporation or individual is unable to keep their promise of paying their debts on time. In this work, panel data analysis (PDA) was used to analyze data from 33 Indian banks over ten years (2010 to 2019). According to the findings, FD has a positive and significant impact on bank operational performance and efficiency. The current study will give the banking industry a better understanding of how a bank's performance can be negatively impacted by distressing conditions that render it inefficient and ineffective. Second, it will show investors how the level of distress can have a significant impact on bank performance in the market, finally resulting in the loss of money invested.
https://doi.org/10.13106/jafeb.2022.vol9.no6.0105 인용 PDF KSCI HTML

Financial Distress Prediction Models for Wind Energy SMEs

Oh, Nak-Kyo
- International Journal of Contents
- /
- v.10 no.4
- /
- pp.75-82
- /
- 2014
The purpose of this paper was to identify suitable variables for financial distress prediction models and to compare the accuracy of MDA and LA for early warning signals for wind energy companies in Korea. The research methods, discriminant analysis and logit analysis have been widely used. The data set consisted of 15 wind energy SMEs in KOSDAQ with financial statements in 2012 from KIS-Value. We found that five financial ratio variables were statistically significant and the accuracy of MDA was 86%, while that of LA is 100%. The importance of this study is that it demonstrates empirically that financial distress prediction models are applicable to the wind energy industry in Korea as an early warning signs of impending bankruptcy.
https://doi.org/10.5392/IJoC.2014.10.4.075 인용 PDF KSCI KPUBS HTML

Feature Selection for Multi-Class Support Vector Machines Using an Impurity Measure of Classification Trees: An Application to the Credit Rating of S&P 500 Companies

Hong, Tae-Ho;Park, Ji-Young
- Asia pacific journal of information systems
- /
- v.21 no.2
- /
- pp.43-58
- /
- 2011
Support vector machines (SVMs), a machine learning technique, has been applied to not only binary classification problems such as bankruptcy prediction but also multi-class problems such as corporate credit ratings. However, in general, the performance of SVMs can be easily worse than the best alternative model to SVMs according to the selection of predictors, even though SVMs has the distinguishing feature of successfully classifying and predicting in a lot of dichotomous or multi-class problems. For overcoming the weakness of SVMs, this study has proposed an approach for selecting features for multi-class SVMs that utilize the impurity measures of classification trees. For the selection of the input features, we employed the C4.5 and CART algorithms, including the stepwise method of discriminant analysis, which is a well-known method for selecting features. We have built a multi-class SVMs model for credit rating using the above method and presented experimental results with data regarding S&P 500 companies.
PDF KSCI

Search Result 123, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)