• Title/Summary/Keyword: Classification and Prediction

Search Result 1,110, Processing Time 0.033 seconds

A Study about Internal Control Deficient Company Forecasting and Characteristics - Based on listed and unlisted companies - (내부통제 취약기업 예측과 특성에 관한 연구 - 상장기업군과 비상장기업군 중심으로 -)

  • Yoo, Kil-Hyun;Kim, Dae-Lyong
    • Journal of Digital Convergence
    • /
    • v.15 no.2
    • /
    • pp.121-133
    • /
    • 2017
  • The propose of study is to examine the characteristics of companies with high possibility to form an internal control weakness using forecasting model. This study use the actual listed/unlisted companies' data from K_financial institution. The first conclusion is that discriminant model is more valid than logit model to predict internal control weak companies. A discriminant model for predicting the vulnerability of internal control has high classification accuracy and has low the Type II error that is incorrectly classifying vulnerable companies to normal companies. The second conclusion is that the characteristic of weak internal control companies have a low credit rating, low asset soundness assessment, high delinquency rates, lower operating cash flow, high debt ratios, and minus operating profit to the net sales ratio. As not only a case of listed companies but unlisted companies which did not occur in previous studies are extended in this study, research results including the forecasting model can be used as a predictive tool of financial institutions predicting companies with high potential internal control weakness to prevent asset losses.

A Method of Developing a Ground Layer with Risk of Ground Subsidence based on the 3D Ground Modeling (3차원 지반모델링 기반의 지반함몰 위험 지반 레이어 개발 방법)

  • Kang, Junggoo;Kang, Jaemo;Parh, Junhwan;Mun, Duhwan
    • Journal of the Korean GEO-environmental Society
    • /
    • v.22 no.12
    • /
    • pp.33-40
    • /
    • 2021
  • The deterioration of underground facilities, disturbance of the ground due to underground development activities, and changes in ground water can cause ground subsidence accidents in the urban areas. The investigation on the geotechnical and hydraulic factors affecting the ground subsidence accident is very significant to predict the ground subsidence risk in advance. In this study, an analysis DB was constructed through 3D ground modeling to utilize the currently operating geotechnical survey information DB and ground water behavior information for risk prediction. Additionally, using these results, the relationship between the actual ground subsidence occurrence history and ground conditions and ground water level changes was confirmed. Furthermore, the methodology used to visualize the risk of ground subsidence was presented by reconstructing the engineering characteristics of the soil presented according to the Unified Soil Classification System (USCS) in the existing geotechnical survey information into the internal erosion sensitivity of the soil, Based on the result, it was confirmed that the ground in the area where the ground subsidence occurred consists of more than 40% of sand (SM, SC, SP, SW) vulnerable to internal erosion. In addition, the effect of the occurrence frequency of ground subsidence due to the change in ground water level is also confirmed.

DISEASE DIAGNOSED AND DESCRIBED BY NIRS

  • Tsenkova, Roumiana N.
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1031-1031
    • /
    • 2001
  • The mammary gland is made up of remarkably sensitive tissue, which has the capability of producing a large volume of secretion, milk, under normal or healthy conditions. When bacteria enter the gland and establish an infection (mastitis), inflammation is initiated accompanied by an influx of white cells from the blood stream, by altered secretory function, and changes in the volume and composition of secretion. Cell numbers in milk are closely associated with inflammation and udder health. These somatic cell counts (SCC) are accepted as the international standard measurement of milk quality in dairy and for mastitis diagnosis. NIR Spectra of unhomogenized composite milk samples from 14 cows (healthy and mastitic), 7days after parturition and during the next 30 days of lactation were measured. Different multivariate analysis techniques were used to diagnose the disease at very early stage and determine how the spectral properties of milk vary with its composition and animal health. PLS model for prediction of somatic cell count (SCC) based on NIR milk spectra was made. The best accuracy of determination for the 1100-2500nm range was found using smoothed absorbance data and 10 PLS factors. The standard error of prediction for independent validation set of samples was 0.382, correlation coefficient 0.854 and the variation coefficient 7.63%. It has been found that SCC determination by NIR milk spectra was indirect and based on the related changes in milk composition. From the spectral changes, we learned that when mastitis occurred, the most significant factors that simultaneously influenced milk spectra were alteration of milk proteins and changes in ionic concentration of milk. It was consistent with the results we obtained further when applied 2DCOS. Two-dimensional correlation analysis of NIR milk spectra was done to assess the changes in milk composition, which occur when somatic cell count (SCC) levels vary. The synchronous correlation map revealed that when SCC increases, protein levels increase while water and lactose levels decrease. Results from the analysis of the asynchronous plot indicated that changes in water and fat absorptions occur before other milk components. In addition, the technique was used to assess the changes in milk during a period when SCC levels do not vary appreciably. Results indicated that milk components are in equilibrium and no appreciable change in a given component was seen with respect to another. This was found in both healthy and mastitic animals. However, milk components were found to vary with SCC content regardless of the range considered. This important finding demonstrates that 2-D correlation analysis may be used to track even subtle changes in milk composition in individual cows. To find out the right threshold for SCC when used for mastitis diagnosis at cow level, classification of milk samples was performed using soft independent modeling of class analogy (SIMCA) and different spectral data pretreatment. Two levels of SCC - 200 000 cells/$m\ell$ and 300 000 cells/$m\ell$, respectively, were set up and compared as thresholds to discriminate between healthy and mastitic cows. The best detection accuracy was found with 200 000 cells/$m\ell$ as threshold for mastitis and smoothed absorbance data: - 98% of the milk samples in the calibration set and 87% of the samples in the independent test set were correctly classified. When the spectral information was studied it was found that the successful mastitis diagnosis was based on reviling the spectral changes related to the corresponding changes in milk composition. NIRS combined with different ways of spectral data ruining can provide faster and nondestructive alternative to current methods for mastitis diagnosis and a new inside into disease understanding at molecular level.

  • PDF

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

Estimation of the Percent of the Vote by Adjustment of Voter Turnout in Election Polls (선거여론조사에서 투표율 반영을 통한 득표율 추정)

  • Kim, Jeonghoon;Han, Sang-Tae;Kang, Hyuncheol
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2873-2881
    • /
    • 2018
  • It is very important to obtain objective and credible information through election polls in order to contribute to the correct voting behavior of the voters or to establish appropriate election strategies for candidates or political parties. Therefore, many related organizations such as political parties, media organizations, and research institutions have been making efforts to improve the accuracy of the results of the polls and the election prediction. Kim et al. (2017) analyzed whether the non-response group responded that there is no support candidate in the election survey to increase the accuracy of the estimation of the vote rate. As a result, it has been confirmed that the accuracy of the estimation of the vote rate can be significantly improved by performing an appropriate classification on the non-response layer. In this study, we propose a method to estimate the turnout by each strata (sex, age group) under the condition that the total turnout rate is given for a specific district (region) and propose a procedure to predict the vote rate by reflecting the turnout. In addition, case studies were conducted using data gathered through telephone interviews for the 20th National Assembly elections in 2016.

Increasing Accuracy of Stock Price Pattern Prediction through Data Augmentation for Deep Learning (데이터 증강을 통한 딥러닝 기반 주가 패턴 예측 정확도 향상 방안)

  • Kim, Youngjun;Kim, Yeojeong;Lee, Insun;Lee, Hong Joo
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.1-12
    • /
    • 2019
  • As Artificial Intelligence (AI) technology develops, it is applied to various fields such as image, voice, and text. AI has shown fine results in certain areas. Researchers have tried to predict the stock market by utilizing artificial intelligence as well. Predicting the stock market is known as one of the difficult problems since the stock market is affected by various factors such as economy and politics. In the field of AI, there are attempts to predict the ups and downs of stock price by studying stock price patterns using various machine learning techniques. This study suggest a way of predicting stock price patterns based on the Convolutional Neural Network(CNN) among machine learning techniques. CNN uses neural networks to classify images by extracting features from images through convolutional layers. Therefore, this study tries to classify candlestick images made by stock data in order to predict patterns. This study has two objectives. The first one referred as Case 1 is to predict the patterns with the images made by the same-day stock price data. The second one referred as Case 2 is to predict the next day stock price patterns with the images produced by the daily stock price data. In Case 1, data augmentation methods - random modification and Gaussian noise - are applied to generate more training data, and the generated images are put into the model to fit. Given that deep learning requires a large amount of data, this study suggests a method of data augmentation for candlestick images. Also, this study compares the accuracies of the images with Gaussian noise and different classification problems. All data in this study is collected through OpenAPI provided by DaiShin Securities. Case 1 has five different labels depending on patterns. The patterns are up with up closing, up with down closing, down with up closing, down with down closing, and staying. The images in Case 1 are created by removing the last candle(-1candle), the last two candles(-2candles), and the last three candles(-3candles) from 60 minutes, 30 minutes, 10 minutes, and 5 minutes candle charts. 60 minutes candle chart means one candle in the image has 60 minutes of information containing an open price, high price, low price, close price. Case 2 has two labels that are up and down. This study for Case 2 has generated for 60 minutes, 30 minutes, 10 minutes, and 5minutes candle charts without removing any candle. Considering the stock data, moving the candles in the images is suggested, instead of existing data augmentation techniques. How much the candles are moved is defined as the modified value. The average difference of closing prices between candles was 0.0029. Therefore, in this study, 0.003, 0.002, 0.001, 0.00025 are used for the modified value. The number of images was doubled after data augmentation. When it comes to Gaussian Noise, the mean value was 0, and the value of variance was 0.01. For both Case 1 and Case 2, the model is based on VGG-Net16 that has 16 layers. As a result, 10 minutes -1candle showed the best accuracy among 60 minutes, 30 minutes, 10 minutes, 5minutes candle charts. Thus, 10 minutes images were utilized for the rest of the experiment in Case 1. The three candles removed from the images were selected for data augmentation and application of Gaussian noise. 10 minutes -3candle resulted in 79.72% accuracy. The accuracy of the images with 0.00025 modified value and 100% changed candles was 79.92%. Applying Gaussian noise helped the accuracy to be 80.98%. According to the outcomes of Case 2, 60minutes candle charts could predict patterns of tomorrow by 82.60%. To sum up, this study is expected to contribute to further studies on the prediction of stock price patterns using images. This research provides a possible method for data augmentation of stock data.

  • PDF

Study for Clinical Indicators of Prediction for Histological Finding of IgA Nephropathy (IgA 신병증의 조직소견을 예측할 수 있는 임상지표에 관한 연구)

  • Han Byong-Mu;Cho Jin-Youl;Chuon Ko-Woon;NamGoong Mee-Kyung
    • Childhood Kidney Diseases
    • /
    • v.7 no.2
    • /
    • pp.150-156
    • /
    • 2003
  • Purpose : Efforts to predict the clinicopathological outcome of IgA nephropathy have been made but have yielded conflicting results and have not helped in deciding the appropriate timing of the renal biopsy. In this study, we reviewed the predictive factors of clinicopathological outcome for finding out the criteria of renal biopsy timing of IgA nephropathy. Methods : Forty children diagnosed with biopsy proven IgA nephropathy at Wonju Christian Hospital were studied retrospectively, based on medical records. Results : Among 39 patients, 2 children progressed to higher serum creatinine level. One of them reached to the end stage renal disease within 2 year 7 months. According to WHO histopathological classification, there were 15 cases of class I, 14 cases of class II, 7 cases of class III, and 3 cases of class IV. In the mild histological classes(class I, II), gross hematuria was shown in 23 out of 29 children(P=0.02). In the severe histological classes(class III, IV), gross hematuria was noted in 4 out of 10(P>0.05). The tubulointerstitial changes were grade 1 in 24 cases, grade 2 in 4 cases, grade 3 in 8 cases, and grade 4 in 3 cases. With an increase in the tubulointerstitial grade, the 24 hour urine protein/albumin ratio increased. Serum creatinine less than 0.79 mg/dL could predict the lower grade(grade 1 and 2) of tubulointerstitial changes. But serum creatinine greater than 1.13 mg/dL could predict the higher grade(grade 3 and 4) of tubulointerstitial changes. In children with gross hematuria(n=27), serum creatinine was lower(0.78 vs 1.09 mg/dL, P=0.027), serum IgA was higher(316.3 vs 198.8 mg/dL), and the cases of lower WHO classification(I and II) were more common(23 vs 4, P=0.029) than the children with microscopic hematuria. Conclusion : Serum creatinine less than 0.79 mg/dL, macroscopic hematuria, and higher 24 hour urine protein/albumin ratio would predict the lower grade glomerulo tubulointerstitial lesion in IgA nephropathy and could be used as the criteria delaying the renal biopsy.

  • PDF

Changes Detection of Ice Dimension in Cheonji, Baekdu Mountain Using Sentinel-1 Image Classification (Sentinel-1 위성의 영상 분류 기법을 이용한 백두산 천지의 얼음 면적 변화 탐지)

  • Park, Sungjae;Eom, Jinah;Ko, Bokyun;Park, Jeong-Won;Lee, Chang-Wook
    • Journal of the Korean earth science society
    • /
    • v.41 no.1
    • /
    • pp.31-39
    • /
    • 2020
  • Cheonji, the largest caldera lake in Asia, is located at the summit of Baekdu Mountain. Cheonji is covered with snow and ice for about six months of the year due to its high altitude and its surrounding environment. Since most of the sources of water are from groundwater, the water temperature is closely related to the volcanic activity. However, in the 2000s, many volcanic activities have been monitored on the mountain. In this study, we analyzed the dimension of ice produced during winter in Baekdu Mountain using Sentinel-1 satellite image data provided by the European Space Agency (ESA). In order to calculate the dimension of ice from the backscatter image of the Sentinel-1 satellite, 20 Gray-Level Co-occurrence Matrix (GLCM) layers were generated from two polarization images using texture analysis. The method used in calculating the area was utilized with the Support Vector Machine (SVM) algorithm to classify the GLCM layer which is to calculate the dimension of ice in the image. Also, the calculated area was correlated with temperature data obtained from Samjiyeon weather station. This study could be used as a basis for suggesting an alternative to the new method of calculating the area of ice before using a long-term time series analysis on a full scale.

Research about feature selection that use heuristic function (휴리스틱 함수를 이용한 feature selection에 관한 연구)

  • Hong, Seok-Mi;Jung, Kyung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.281-286
    • /
    • 2003
  • A large number of features are collected for problem solving in real life, but to utilize ail the features collected would be difficult. It is not so easy to collect of correct data about all features. In case it takes advantage of all collected data to learn, complicated learning model is created and good performance result can't get. Also exist interrelationships or hierarchical relations among the features. We can reduce feature's number analyzing relation among the features using heuristic knowledge or statistical method. Heuristic technique refers to learning through repetitive trial and errors and experience. Experts can approach to relevant problem domain through opinion collection process by experience. These properties can be utilized to reduce the number of feature used in learning. Experts generate a new feature (highly abstract) using raw data. This paper describes machine learning model that reduce the number of features used in learning using heuristic function and use abstracted feature by neural network's input value. We have applied this model to the win/lose prediction in pro-baseball games. The result shows the model mixing two techniques not only reduces the complexity of the neural network model but also significantly improves the classification accuracy than when neural network and heuristic model are used separately.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.