• Title/Summary/Keyword: 신뢰도 검증

Search Result 5,396, Processing Time 0.032 seconds

A Study on the Application of Outlier Analysis for Fraud Detection: Focused on Transactions of Auction Exception Agricultural Products (부정 탐지를 위한 이상치 분석 활용방안 연구 : 농수산 상장예외품목 거래를 대상으로)

  • Kim, Dongsung;Kim, Kitae;Kim, Jongwoo;Park, Steve
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.93-108
    • /
    • 2014
  • To support business decision making, interests and efforts to analyze and use transaction data in different perspectives are increasing. Such efforts are not only limited to customer management or marketing, but also used for monitoring and detecting fraud transactions. Fraud transactions are evolving into various patterns by taking advantage of information technology. To reflect the evolution of fraud transactions, there are many efforts on fraud detection methods and advanced application systems in order to improve the accuracy and ease of fraud detection. As a case of fraud detection, this study aims to provide effective fraud detection methods for auction exception agricultural products in the largest Korean agricultural wholesale market. Auction exception products policy exists to complement auction-based trades in agricultural wholesale market. That is, most trades on agricultural products are performed by auction; however, specific products are assigned as auction exception products when total volumes of products are relatively small, the number of wholesalers is small, or there are difficulties for wholesalers to purchase the products. However, auction exception products policy makes several problems on fairness and transparency of transaction, which requires help of fraud detection. In this study, to generate fraud detection rules, real huge agricultural products trade transaction data from 2008 to 2010 in the market are analyzed, which increase more than 1 million transactions and 1 billion US dollar in transaction volume. Agricultural transaction data has unique characteristics such as frequent changes in supply volumes and turbulent time-dependent changes in price. Since this was the first trial to identify fraud transactions in this domain, there was no training data set for supervised learning. So, fraud detection rules are generated using outlier detection approach. We assume that outlier transactions have more possibility of fraud transactions than normal transactions. The outlier transactions are identified to compare daily average unit price, weekly average unit price, and quarterly average unit price of product items. Also quarterly averages unit price of product items of the specific wholesalers are used to identify outlier transactions. The reliability of generated fraud detection rules are confirmed by domain experts. To determine whether a transaction is fraudulent or not, normal distribution and normalized Z-value concept are applied. That is, a unit price of a transaction is transformed to Z-value to calculate the occurrence probability when we approximate the distribution of unit prices to normal distribution. The modified Z-value of the unit price in the transaction is used rather than using the original Z-value of it. The reason is that in the case of auction exception agricultural products, Z-values are influenced by outlier fraud transactions themselves because the number of wholesalers is small. The modified Z-values are called Self-Eliminated Z-scores because they are calculated excluding the unit price of the specific transaction which is subject to check whether it is fraud transaction or not. To show the usefulness of the proposed approach, a prototype of fraud transaction detection system is developed using Delphi. The system consists of five main menus and related submenus. First functionalities of the system is to import transaction databases. Next important functions are to set up fraud detection parameters. By changing fraud detection parameters, system users can control the number of potential fraud transactions. Execution functions provide fraud detection results which are found based on fraud detection parameters. The potential fraud transactions can be viewed on screen or exported as files. The study is an initial trial to identify fraud transactions in Auction Exception Agricultural Products. There are still many remained research topics of the issue. First, the scope of analysis data was limited due to the availability of data. It is necessary to include more data on transactions, wholesalers, and producers to detect fraud transactions more accurately. Next, we need to extend the scope of fraud transaction detection to fishery products. Also there are many possibilities to apply different data mining techniques for fraud detection. For example, time series approach is a potential technique to apply the problem. Even though outlier transactions are detected based on unit prices of transactions, however it is possible to derive fraud detection rules based on transaction volumes.

A Thermal Time-Driven Dormancy Index as a Complementary Criterion for Grape Vine Freeze Risk Evaluation (포도 동해위험 판정기준으로서 온도시간 기반의 휴면심도 이용)

  • Kwon, Eun-Young;Jung, Jea-Eun;Chung, U-Ran;Lee, Seung-Jong;Song, Gi-Cheol;Choi, Dong-Geun;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.1
    • /
    • pp.1-9
    • /
    • 2006
  • Regardless of the recent observed warmer winters in Korea, more freeze injuries and associated economic losses are reported in fruit industry than ever before. Existing freeze-frost forecasting systems employ only daily minimum temperature for judging the potential damage on dormant flowering buds but cannot accommodate potential biological responses such as short-term acclimation of plants to severe weather episodes as well as annual variation in climate. We introduce 'dormancy depth', in addition to daily minimum temperature, as a complementary criterion for judging the potential damage of freezing temperatures on dormant flowering buds of grape vines. Dormancy depth can be estimated by a phonology model driven by daily maximum and minimum temperature and is expected to make a reasonable proxy for physiological tolerance of buds to low temperature. Dormancy depth at a selected site was estimated for a climatological normal year by this model, and we found a close similarity in time course change pattern between the estimated dormancy depth and the known cold tolerance of fruit trees. Inter-annual and spatial variation in dormancy depth were identified by this method, showing the feasibility of using dormancy depth as a proxy indicator for tolerance to low temperature during the winter season. The model was applied to 10 vineyards which were recently damaged by a cold spell, and a temperature-dormancy depth-freeze injury relationship was formulated into an exponential-saturation model which can be used for judging freeze risk under a given set of temperature and dormancy depth. Based on this model and the expected lowest temperature with a 10-year recurrence interval, a freeze risk probability map was produced for Hwaseong County, Korea. The results seemed to explain why the vineyards in the warmer part of Hwaseong County have been hit by more freeBe damage than those in the cooler part of the county. A dormancy depth-minimum temperature dual engine freeze warning system was designed for vineyards in major production counties in Korea by combining the site-specific dormancy depth and minimum temperature forecasts with the freeze risk model. In this system, daily accumulation of thermal time since last fall leads to the dormancy state (depth) for today. The regional minimum temperature forecast for tomorrow by the Korea Meteorological Administration is converted to the site specific forecast at a 30m resolution. These data are input to the freeze risk model and the percent damage probability is calculated for each grid cell and mapped for the entire county. Similar approaches may be used to develop freeze warning systems for other deciduous fruit trees.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Risk Analysis of Arsenic in Rice Using by HPLC-ICP-MS (HPLC-ICP-MS를 이용한 쌀의 비소 위해도 평가)

  • An, Jae-Min;Park, Dae-Han;Hwang, Hyang-Ran;Chang, Soon-Young;Kwon, Mi-Jung;Kim, In-Sook;Kim, Ik-Ro;Lee, Hye-Min;Lim, Hyun-Ji;Park, Jae-Ok;Lee, Gwang-Hee
    • Korean Journal of Environmental Agriculture
    • /
    • v.37 no.4
    • /
    • pp.291-301
    • /
    • 2018
  • BACKGROUND: Rice is one of the main sources for inorganic arsenic among the consumed crops in the world population's diet. Arsenic is classified into Group 1 as it is carcinogenic for humans, according to the IARC. This study was carried out to assess dietary exposure risk of inorganic arsenic in husked rice and polished rice to the Korean population health. METHODS AND RESULTS: Total arsenic was determined using microwave device and ICP-MS. Inorganic arsenic was determined by ICP-MS coupled with HPLC system. The HPLC-ICP-MS analysis was optimized based on the limit of detection, limit of quantitation, and recovery ratio to be $0.73-1.24{\mu}g/kg$, $2.41-4.09{\mu}g/kg$, and 96.5-98.9%, respectively. The inorganic arsenic concentrations of daily exposure (included in body weight) were $4.97{\times}10^{-3}$ (${\geq}20$ years old) $-1.36{\times}10^{-2}$ (${\leq}2$ years old) ${\mu}g/kg\;b.w./day$ (PTWI 0.23-0.63%) by the husked rice, and $1.39{\times}10^{-1}$ (${\geq}20$ years old) $-3.21{\times}10^{-1}$ (${\leq}2$ years old) ${\mu}g/kg\;b.w./day$ (PTWI 6.47-15.00%) by the polished rice. CONCLUSION: The levels of overall exposure to total and inorganic arsenic by the husked and polished rice were far lower than the recommended levels of The Joint FAO/WHO Expert Committee on Food Additives (JECFA), indicating of little possibility of risk.

A Study on the Market Structure Analysis for Durable Goods Using Consideration Set:An Exploratory Approach for Automotive Market (고려상표군을 이용한 내구재 시장구조 분석에 관한 연구: 자동차 시장에 대한 탐색적 분석방법)

  • Lee, Seokoo
    • Asia Marketing Journal
    • /
    • v.14 no.2
    • /
    • pp.157-176
    • /
    • 2012
  • Brand switching data frequently used in market structure analysis is adequate to analyze non- durable goods, because it can capture competition between specific two brands. But brand switching data sometimes can not be used to analyze goods like automobiles having long term duration because one of main assumptions that consumer preference toward brand attributes is not changed against time can be violated. Therefore a new type of data which can precisely capture competition among durable goods is needed. Another problem of using brand switching data collected from actual purchase behavior is short of explanation why consumers consider different set of brands. Considering above problems, main purpose of this study is to analyze market structure for durable goods with consideration set. The author uses exploratory approach and latent class clustering to identify market structure based on heterogeneous consideration set among consumers. Then the relationship between some factors and consideration set formation is analyzed. Some benefits and two demographic variables - age and income - are selected as factors based on consumer behavior theory. The author analyzed USA automotive market with top 11 brands using exploratory approach and latent class clustering. 2,500 respondents are randomly selected from the total sample and used for analysis. Six models concerning market structure are established to test. Model 1 means non-structured market and model 6 means market structure composed of six sub-markets. It is exploratory approach because any hypothetical market structure is not defined. The result showed that model 1 is insufficient to fit data. It implies that USA automotive market is a structured market. Model 3 with three market structures is significant and identified as the optimal market structure in USA automotive market. Three sub markets are named as USA brands, Asian Brands, and European Brands. And it implies that country of origin effect may exist in USA automotive market. Comparison between modal classification by derived market structures and probabilistic classification by research model was conducted to test how model 3 can correctly classify respondents. The model classify 97% of respondents exactly. The result of this study is different from those of previous research. Previous research used confirmatory approach. Car type and price were chosen as criteria for market structuring and car type-price structure was revealed as the optimal structure for USA automotive market. But this research used exploratory approach without hypothetical market structures. It is not concluded yet which approach is superior. For confirmatory approach, hypothetical market structures should be established exhaustively, because the optimal market structure is selected among hypothetical structures. On the other hand, exploratory approach has a potential problem that validity for derived optimal market structure is somewhat difficult to verify. There also exist market boundary difference between this research and previous research. While previous research analyzed seven car brands, this research analyzed eleven car brands. Both researches seemed to represent entire car market, because cumulative market shares for analyzed brands exceeds 50%. But market boundary difference might affect the different results. Though both researches showed different results, it is obvious that country of origin effect among brands should be considered as important criteria to analyze USA automotive market structure. This research tried to explain heterogeneity of consideration sets among consumers using benefits and two demographic factors, sex and income. Benefit works as a key variable for consumer decision process, and also works as an important criterion in market segmentation. Three factors - trust/safety, image/fun to drive, and economy - are identified among nine benefit related measure. Then the relationship between market structures and independent variables is analyzed using multinomial regression. Independent variables are three benefit factors and two demographic factors. The result showed that all independent variables can be used to explain why there exist different market structures in USA automotive market. For example, a male consumer who perceives all benefits important and has lower income tends to consider domestic brands more than European brands. And the result also showed benefits, sex, and income have an effect to consideration set formation. Though it is generally perceived that a consumer who has higher income is likely to purchase a high priced car, it is notable that American consumers perceived benefits of domestic brands much positive regardless of income. Male consumers especially showed higher loyalty for domestic brands. Managerial implications of this research are as follow. Though implication may be confined to the USA automotive market, the effect of sex on automotive buying behavior should be analyzed. The automotive market is traditionally conceived as male consumers oriented market. But the proportion of female consumers has grown over the years in the automotive market. It is natural outcome that Volvo and Hyundai motors recently developed new cars which are targeted for women market. Secondly, the model used in this research can be applied easier than that of previous researches. Exploratory approach has many advantages except difficulty to apply for practice, because it tends to accompany with complicated model and to require various types of data. The data needed for the model in this research are a few items such as purchased brands, consideration set, some benefits, and some demographic factors and easy to collect from consumers.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.