• Title/Summary/Keyword: probability distribution functions

Search Result 265, Processing Time 0.025 seconds

Evaluation of extreme rainfall estimation obtained from NSRP model based on the objective function with statistical third moment (통계적 3차 모멘트 기반의 목적함수를 이용한 NSRP 모형의 극치강우 재현능력 평가)

  • Cho, Hemie;Kim, Yong-Tak;Yu, Jae-Ung;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.7
    • /
    • pp.545-556
    • /
    • 2022
  • It is recommended to use long-term hydrometeorological data for more than the service life of the hydraulic structures and water resource planning. For the purpose of expanding rainfall data, stochastic simulation models, such as Modified Bartlett-Lewis Rectangular Pulse (BLRP) and Neyman-Scott Rectangular Pulse (NSRP) models, have been widely used. The optimal parameters of the model can be estimated by repeatedly comparing the statistical moments defined through a combination of parameters of the probability distribution in the optimization context. However, parameter estimation using relatively small observed rainfall statistics corresponds to an ill-posed problem, leading to an increase in uncertainty in the parameter estimation process. In addition, as shown in previous studies, extreme values are underestimated because objective functions are typically defined by the first and second statistical moments (i.e., mean and variance). In this regard, this study estimated the parameters of the NSRP model using the objective function with the third moment and compared it with the existing approach based on the first and second moments in terms of estimation of extreme rainfall. It was found that the first and second moments did not show a significant difference depending on whether or not the skewness was considered in the objective function. However, the proposed model showed significantly improved performance in terms of estimation of design rainfalls.

A Study on the Application of Outlier Analysis for Fraud Detection: Focused on Transactions of Auction Exception Agricultural Products (부정 탐지를 위한 이상치 분석 활용방안 연구 : 농수산 상장예외품목 거래를 대상으로)

  • Kim, Dongsung;Kim, Kitae;Kim, Jongwoo;Park, Steve
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.93-108
    • /
    • 2014
  • To support business decision making, interests and efforts to analyze and use transaction data in different perspectives are increasing. Such efforts are not only limited to customer management or marketing, but also used for monitoring and detecting fraud transactions. Fraud transactions are evolving into various patterns by taking advantage of information technology. To reflect the evolution of fraud transactions, there are many efforts on fraud detection methods and advanced application systems in order to improve the accuracy and ease of fraud detection. As a case of fraud detection, this study aims to provide effective fraud detection methods for auction exception agricultural products in the largest Korean agricultural wholesale market. Auction exception products policy exists to complement auction-based trades in agricultural wholesale market. That is, most trades on agricultural products are performed by auction; however, specific products are assigned as auction exception products when total volumes of products are relatively small, the number of wholesalers is small, or there are difficulties for wholesalers to purchase the products. However, auction exception products policy makes several problems on fairness and transparency of transaction, which requires help of fraud detection. In this study, to generate fraud detection rules, real huge agricultural products trade transaction data from 2008 to 2010 in the market are analyzed, which increase more than 1 million transactions and 1 billion US dollar in transaction volume. Agricultural transaction data has unique characteristics such as frequent changes in supply volumes and turbulent time-dependent changes in price. Since this was the first trial to identify fraud transactions in this domain, there was no training data set for supervised learning. So, fraud detection rules are generated using outlier detection approach. We assume that outlier transactions have more possibility of fraud transactions than normal transactions. The outlier transactions are identified to compare daily average unit price, weekly average unit price, and quarterly average unit price of product items. Also quarterly averages unit price of product items of the specific wholesalers are used to identify outlier transactions. The reliability of generated fraud detection rules are confirmed by domain experts. To determine whether a transaction is fraudulent or not, normal distribution and normalized Z-value concept are applied. That is, a unit price of a transaction is transformed to Z-value to calculate the occurrence probability when we approximate the distribution of unit prices to normal distribution. The modified Z-value of the unit price in the transaction is used rather than using the original Z-value of it. The reason is that in the case of auction exception agricultural products, Z-values are influenced by outlier fraud transactions themselves because the number of wholesalers is small. The modified Z-values are called Self-Eliminated Z-scores because they are calculated excluding the unit price of the specific transaction which is subject to check whether it is fraud transaction or not. To show the usefulness of the proposed approach, a prototype of fraud transaction detection system is developed using Delphi. The system consists of five main menus and related submenus. First functionalities of the system is to import transaction databases. Next important functions are to set up fraud detection parameters. By changing fraud detection parameters, system users can control the number of potential fraud transactions. Execution functions provide fraud detection results which are found based on fraud detection parameters. The potential fraud transactions can be viewed on screen or exported as files. The study is an initial trial to identify fraud transactions in Auction Exception Agricultural Products. There are still many remained research topics of the issue. First, the scope of analysis data was limited due to the availability of data. It is necessary to include more data on transactions, wholesalers, and producers to detect fraud transactions more accurately. Next, we need to extend the scope of fraud transaction detection to fishery products. Also there are many possibilities to apply different data mining techniques for fraud detection. For example, time series approach is a potential technique to apply the problem. Even though outlier transactions are detected based on unit prices of transactions, however it is possible to derive fraud detection rules based on transaction volumes.

Corporate Credit Rating based on Bankruptcy Probability Using AdaBoost Algorithm-based Support Vector Machine (AdaBoost 알고리즘기반 SVM을 이용한 부실 확률분포 기반의 기업신용평가)

  • Shin, Taek-Soo;Hong, Tae-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.25-41
    • /
    • 2011
  • Recently, support vector machines (SVMs) are being recognized as competitive tools as compared with other data mining techniques for solving pattern recognition or classification decision problems. Furthermore, many researches, in particular, have proved them more powerful than traditional artificial neural networks (ANNs) (Amendolia et al., 2003; Huang et al., 2004, Huang et al., 2005; Tay and Cao, 2001; Min and Lee, 2005; Shin et al., 2005; Kim, 2003).The classification decision, such as a binary or multi-class decision problem, used by any classifier, i.e. data mining techniques is so cost-sensitive particularly in financial classification problems such as the credit ratings that if the credit ratings are misclassified, a terrible economic loss for investors or financial decision makers may happen. Therefore, it is necessary to convert the outputs of the classifier into wellcalibrated posterior probabilities-based multiclass credit ratings according to the bankruptcy probabilities. However, SVMs basically do not provide such probabilities. So it required to use any method to create the probabilities (Platt, 1999; Drish, 2001). This paper applied AdaBoost algorithm-based support vector machines (SVMs) into a bankruptcy prediction as a binary classification problem for the IT companies in Korea and then performed the multi-class credit ratings of the companies by making a normal distribution shape of posterior bankruptcy probabilities from the loss functions extracted from the SVMs. Our proposed approach also showed that their methods can minimize the misclassification problems by adjusting the credit grade interval ranges on condition that each credit grade for credit loan borrowers has its own credit risk, i.e. bankruptcy probability.

Characteristics of the Graded Wildlife Dose Assessment Code K-BIOTA and Its Application (단계적 야생동식물 선량평가 코드 K-BIOTA의 특성 및 적용)

  • Keum, Dong-Kwon;Jun, In;Lim, Kwang-Muk;Kim, Byeong-Ho;Choi, Yong-Ho
    • Journal of Radiation Protection and Research
    • /
    • v.40 no.4
    • /
    • pp.252-260
    • /
    • 2015
  • This paper describes the technical background for the Korean wildlife radiation dose assessment code, K-BIOTA, and the summary of its application. The K-BIOTA applies the graded approaches of 3 levels including the screening assessment (Level 1 & 2), and the detailed assessment based on the site specific data (Level 3). The screening level assessment is a preliminary step to determine whether the detailed assessment is needed, and calculates the dose rate for the grouped organisms, rather than an individual biota. In the Level 1 assessment, the risk quotient (RQ) is calculated by comparing the actual media concentration with the environmental media concentration limit (EMCL) derived from a bench-mark screening reference dose rate. If RQ for the Level 1 assessment is less than 1, it can be determined that the ecosystem would maintain its integrity, and the assessment is terminated. If the RQ is greater than 1, the Level 2 assessment, which calculates RQ using the average value of the concentration ratio (CR) and equilibrium distribution coefficient (Kd) for the grouped organisms, is carried out for the more realistic assessment. Thus, the Level 2 assessment is less conservative than the Level 1 assessment. If RQ for the Level 2 assessment is less than 1, it can be determined that the ecosystem would maintain its integrity, and the assessment is terminated. If the RQ is greater than 1, the Level 3 assessment is performed for the detailed assessment. In the Level 3 assessment, the radiation dose for the representative organism of a site is calculated by using the site specific data of occupancy factor, CR and Kd. In addition, the K-BIOTA allows the uncertainty analysis of the dose rate on CR, Kd and environmental medium concentration among input parameters optionally in the Level 3 assessment. The four probability density functions of normal, lognormal, uniform and exponential distribution can be applied.The applicability of the code was tested through the participation of IAEA EMRAS II (Environmental Modeling for Radiation Safety) for the comparison study of environmental models comparison, and as the result, it was proved that the K-BIOTA would be very useful to assess the radiation risk of the wildlife living in the various contaminated environment.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.