• Title/Summary/Keyword: 정보량(값)

Search Result 1,363, Processing Time 0.027 seconds

A Study on the Application of Outlier Analysis for Fraud Detection: Focused on Transactions of Auction Exception Agricultural Products (부정 탐지를 위한 이상치 분석 활용방안 연구 : 농수산 상장예외품목 거래를 대상으로)

  • Kim, Dongsung;Kim, Kitae;Kim, Jongwoo;Park, Steve
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.93-108
    • /
    • 2014
  • To support business decision making, interests and efforts to analyze and use transaction data in different perspectives are increasing. Such efforts are not only limited to customer management or marketing, but also used for monitoring and detecting fraud transactions. Fraud transactions are evolving into various patterns by taking advantage of information technology. To reflect the evolution of fraud transactions, there are many efforts on fraud detection methods and advanced application systems in order to improve the accuracy and ease of fraud detection. As a case of fraud detection, this study aims to provide effective fraud detection methods for auction exception agricultural products in the largest Korean agricultural wholesale market. Auction exception products policy exists to complement auction-based trades in agricultural wholesale market. That is, most trades on agricultural products are performed by auction; however, specific products are assigned as auction exception products when total volumes of products are relatively small, the number of wholesalers is small, or there are difficulties for wholesalers to purchase the products. However, auction exception products policy makes several problems on fairness and transparency of transaction, which requires help of fraud detection. In this study, to generate fraud detection rules, real huge agricultural products trade transaction data from 2008 to 2010 in the market are analyzed, which increase more than 1 million transactions and 1 billion US dollar in transaction volume. Agricultural transaction data has unique characteristics such as frequent changes in supply volumes and turbulent time-dependent changes in price. Since this was the first trial to identify fraud transactions in this domain, there was no training data set for supervised learning. So, fraud detection rules are generated using outlier detection approach. We assume that outlier transactions have more possibility of fraud transactions than normal transactions. The outlier transactions are identified to compare daily average unit price, weekly average unit price, and quarterly average unit price of product items. Also quarterly averages unit price of product items of the specific wholesalers are used to identify outlier transactions. The reliability of generated fraud detection rules are confirmed by domain experts. To determine whether a transaction is fraudulent or not, normal distribution and normalized Z-value concept are applied. That is, a unit price of a transaction is transformed to Z-value to calculate the occurrence probability when we approximate the distribution of unit prices to normal distribution. The modified Z-value of the unit price in the transaction is used rather than using the original Z-value of it. The reason is that in the case of auction exception agricultural products, Z-values are influenced by outlier fraud transactions themselves because the number of wholesalers is small. The modified Z-values are called Self-Eliminated Z-scores because they are calculated excluding the unit price of the specific transaction which is subject to check whether it is fraud transaction or not. To show the usefulness of the proposed approach, a prototype of fraud transaction detection system is developed using Delphi. The system consists of five main menus and related submenus. First functionalities of the system is to import transaction databases. Next important functions are to set up fraud detection parameters. By changing fraud detection parameters, system users can control the number of potential fraud transactions. Execution functions provide fraud detection results which are found based on fraud detection parameters. The potential fraud transactions can be viewed on screen or exported as files. The study is an initial trial to identify fraud transactions in Auction Exception Agricultural Products. There are still many remained research topics of the issue. First, the scope of analysis data was limited due to the availability of data. It is necessary to include more data on transactions, wholesalers, and producers to detect fraud transactions more accurately. Next, we need to extend the scope of fraud transaction detection to fishery products. Also there are many possibilities to apply different data mining techniques for fraud detection. For example, time series approach is a potential technique to apply the problem. Even though outlier transactions are detected based on unit prices of transactions, however it is possible to derive fraud detection rules based on transaction volumes.

Comparison of Two Methods for Estimating the Appearance Probability of Seawater Temperature Difference for the Development of Ocean Thermal Energy (해양온도차에너지 개발을 위한 해수온도차 출현확률 산정 방법 비교)

  • Yoon, Dong-Young;Choi, Hyun-Woo;Lee, Kwang-Soo;Park, Jin-Soon;Kim, Kye-Hyun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.2
    • /
    • pp.94-106
    • /
    • 2010
  • Understanding of the amount of energy resources and site selection are required prior to develop Ocean Thermal Energy (OTE). It is necessary to calculate the appearance probability of difference of seawater temperature(${\Delta}T$) between sea surface layer and underwater layers. This research mainly aimed to calculate the appearance probability of ${\Delta}T$ using frequency analysis(FA) and harmonic analysis(HA), and compare the advantages and weaknesses of those methods which has used in the South Sea of Korea. Spatial scale for comparison of two methods was divided into local and global scales related to the estimation of energy resources amount and site selection. In global scale, the Probability Differences(PD) of calculated ${\Delta}T$ from using both methods were created as spatial distribution maps, and compared areas of PD. In local scale, both methods were compared with not only the results of PD at the region of highest probability but also bimonthly probabilities in the regions of highest and lowest PD. Basically, the strong relationship(pearson r=0.96, ${\alpha}$=0.05) between probabilities of two methods showed the usefulness of both methods. In global scale, the area of PD more than 10% was less than 5% of the whole area, which means both methods can be applied to estimate the amount of OTE resources. However, in practice, HA method was considered as a more pragmatic method due to its capability of calculating under various ${\Delta}T$ conditions. In local scale, there was no significant difference between the high probability areas by both methods, showing difference under 5%. However, while FA could detect the whole range of probability, HA had a disadvantage of inability of detecting probability less than 10%. Therefore it was analyzed that the HA is more suitable to estimate the amount of energy resources, and FA is more suitable to select the site for OTE development.

New Fast Block-Matching Motion Estimation using Temporal and Spatial Correlation of Motion Vectors (움직임 벡터의 시공간 상관성을 이용한 새로운 고속 블럭 정합 움직임 추정 방식)

  • 남재열;서재수;곽진석;이명호;송근원
    • Journal of Broadcast Engineering
    • /
    • v.5 no.2
    • /
    • pp.247-259
    • /
    • 2000
  • This paper introduces a new technique that reduces the search times and Improves the accuracy of motion estimation using high temporal and spatial correlation of motion vector. Instead of using the fixed first search Point of previously proposed search algorithms, the proposed method finds more accurate first search point as to compensating searching area using high temporal and spatial correlation of motion vector. Therefore, the main idea of proposed method is to find first search point to improve the performance of motion estimation and reduce the search times. The proposed method utilizes the direction of the same coordinate block of the previous frame compared with a block of the current frame to use temporal correlation and the direction of the adjacent blocks of the current frame to use spatial correlation. Based on these directions, we compute the first search point. We search the motion vector in the middle of computed first search point with two fixed search patterns. Using that idea, an efficient adaptive predicted direction search algorithm (APDSA) for block matching motion estimation is proposed. In the experimental results show that the PSNR values are improved up to the 3.6dB as depend on the Image sequences and advanced about 1.7dB on an average. The results of the comparison show that the performance of the proposed APDSA algorithm is better than those of other fast search algorithms whether the image sequence contains fast or slow motion, and is similar to the performance of the FS (Full Search) algorithm. Simulation results also show that the performance of the APDSA scheme gives better subjective picture quality than the other fast search algorithms and is closer to that of the FS algorithm.

  • PDF

Incremental Ensemble Learning for The Combination of Multiple Models of Locally Weighted Regression Using Genetic Algorithm (유전 알고리즘을 이용한 국소가중회귀의 다중모델 결합을 위한 점진적 앙상블 학습)

  • Kim, Sang Hun;Chung, Byung Hee;Lee, Gun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.9
    • /
    • pp.351-360
    • /
    • 2018
  • The LWR (Locally Weighted Regression) model, which is traditionally a lazy learning model, is designed to obtain the solution of the prediction according to the input variable, the query point, and it is a kind of the regression equation in the short interval obtained as a result of the learning that gives a higher weight value closer to the query point. We study on an incremental ensemble learning approach for LWR, a form of lazy learning and memory-based learning. The proposed incremental ensemble learning method of LWR is to sequentially generate and integrate LWR models over time using a genetic algorithm to obtain a solution of a specific query point. The weaknesses of existing LWR models are that multiple LWR models can be generated based on the indicator function and data sample selection, and the quality of the predictions can also vary depending on this model. However, no research has been conducted to solve the problem of selection or combination of multiple LWR models. In this study, after generating the initial LWR model according to the indicator function and the sample data set, we iterate evolution learning process to obtain the proper indicator function and assess the LWR models applied to the other sample data sets to overcome the data set bias. We adopt Eager learning method to generate and store LWR model gradually when data is generated for all sections. In order to obtain a prediction solution at a specific point in time, an LWR model is generated based on newly generated data within a predetermined interval and then combined with existing LWR models in a section using a genetic algorithm. The proposed method shows better results than the method of selecting multiple LWR models using the simple average method. The results of this study are compared with the predicted results using multiple regression analysis by applying the real data such as the amount of traffic per hour in a specific area and hourly sales of a resting place of the highway, etc.

Usefulness of Region Cut Subtraction in Fusion & MIP 3D Reconstruction Image (Fusion & Maximum Intensity Projection 3D 재구성 영상에서 Region Cut Subtraction의 유용성)

  • Moon, A-Reum;Chi, Yong-Gi;Choi, Sung-Wook;Lee, Hyuk;Lee, Kyoo-Bok;Seok, Jae-Dong
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.1
    • /
    • pp.18-23
    • /
    • 2010
  • Purpose: PET/CT combines functional and morphologic data and increases diagnostic accuracy in a variety of malignancies. Especially reconstructed Fusion PET/CT images or MIP (Maximum Intensity Projection) images from a 2-dimensional image to a 3-dimensional one are useful in visualization of the lesion. But in Fusion & MIP 3D reconstruction image, due to hot uptake by urine or urostomy bag, lesion is overlapped so it is difficult that we can distinguish the lesion with the naked eye. This research tries to improve a distinction by removing parts of hot uptake. Materials and Methods: This research has been conducted the object of patients who have went to our hospital from September 2008 to March 2009 and have a lot of urine of remaining volume as disease of uterus, bladder, rectum in the result of PET/CT examination. We used GE Company's Advantage Workstation AW4.3 05 Version Volume Viewer program. As an analysis method, set up ROI in region of removal in axial volume image, select Cut Outside and apply same method in coronal volume image. Next, adjust minimum value in Threshold of 3D Tools, select subtraction in Advanced Processing. It makes Fusion & MIP images and compares them with the image no using Region Cut Definition. Results: In Fusion & MIP 3D reconstruction image, it makes Fusion & MIP images and compares them by using Advantage Workstation AW4.3 05's Region Cut Subtraction, parts of hot uptake according to patient's urine can be removed. Distinction of lesion was clearly reconstructed in image using Region Cut Definition. Conclusion: After examining the patients showing hot uptake on account of volume of urine intake in bladder, in process of reconstruction image, if parts of hot uptake would be removed, it could contribute to offering much better diagnostic information than image subtraction of conventional method. Especially in case of disease of uterus, bladder and rectum, it will be helpful for qualitative improvement of image.

  • PDF

Defining Homogeneous Weather Forecasting Regions in Southern Parts of Korea (남부지방의 일기예보구역 설정에 관한 연구)

  • Kim, Il-Kon;Park, Hyun-Wook
    • Journal of the Korean Geographical Society
    • /
    • v.31 no.3
    • /
    • pp.469-488
    • /
    • 1996
  • The defining of weather forecasting regions is possible. since the representativeness of regional weather can by reasonably clarified in terms of weather entropy and the use of information ratio. In this paper, the weather entropy and information ratio were derived numerially from using the information theory. The typical weather characteristics were clarified and defined in the homogeneous weather forecasting regions of the southern parts of Korea. The data used for this study are the daily precipitation and cloudiness during the recent five years (1990-1994) at 42 stations in southern parts of Korea. It is divided into four classes of fine, clear, cloudy and rainy. The results are summarized as follows: 1. The maximum value of weather entropy in study area is 2.009 vits in Yosu in July, and the minimum one is 1.624 bits in Kohung in October. The mean value of weather entropy is maximal in July, on the other hand, minimal in October during four season. The less the value of entropy is, the stabler the weather is. While the bigger the value of entropy is, the more changeable the weather is. 2. The deviation from mean value of weather entropy in southern parts of Korea, with the positive and the negative parts, shows remarkably the distributional tendency of the east (positive) and the west (negative) in January but of the south (positive) and the north (negative) in July. It also clearly shows the distributional tendency of the east (postive) and the west(negative) in the coastal region in April, and of X-type (southern west and northern east: negative) in Chiri Mt. in October. 3. In southern parts, the average information ratio maximaly appear 0.618 in Taegu area in July, whereas minimally 0.550 in Kwangju in October. Particularly the average information ratio of Pusan area is the greatest in April, but the smallest in October. And in Taegu, Kwangju, and Kunsan, it is the greatest in April, January, and July, but the smallest in Jyly, July, and pril. 4.The narrowest appreance of weather representativeness is in July when the Kwangju is the center of the weather forecasting. But the broadest one is in April when Taegu is the center of weather forecasting. 5. The defining of weather forecasting regions in terms of the difference of information ratio most broadly shows up in July in Pusan including the whole Honam area and the southern parts of Youngnam when the Pusan-Taegu is the basis of the application of information ratio. Meanwhile, it appears most broadly in January in Taegu including the whole southern parts except southern coastal area.

  • PDF

Sugar Contents Analysis of Retort Foods (레토르트식품에 함유되어 있는 당 함량 분석)

  • Jeong, Da-Un;Im, Jun;Kim, Cheon-Hoe;Kim, Young-Kyoung;Park, Yoon-Jin;Jeong, Yoon-Hwa;Om, Ae-Son
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.44 no.11
    • /
    • pp.1666-1671
    • /
    • 2015
  • The purpose of this study was to provide trustworthy nutritional information by analyzing sugar contents of commercial retort foods. A total of 70 retort food samples were collected, which included curry (n=21), blackbean- sauce (n=16), sauce (n=17), and meat (n=16) from markets in Seoul and Gyeonggi-do. Contents of sugars such as glucose, fructose, sucrose, maltose, and lactose were analyzed in retort foods by using a high performance liquid chromatography-refractive index detector and compared to their assigned values on nutritional information labels. Analyzed sugar contents of curries, black-bean-sauces, sauces, and meats ranged from 1.05~4.63 g/100 g, 1.76~5.16 g/100 g, 0.35~25.44 g/100 g, and 1.98~11.07 g/100 g, respectively. Sauces were found to contain the highest amounts of total sugar. These analysis values were equivalent to the reference values indicated on nutrition labels, which were 40~119.5% for curries, 29~118% for black-bean-sauces, 18~118% for sauces, and 70~119.8% for meats. Therefore, this study provides reliable analytical values for sugar contents in retort foods.

ATM Cell Encipherment Method using Rijndael Algorithm in Physical Layer (Rijndael 알고리즘을 이용한 물리 계층 ATM 셀 보안 기법)

  • Im Sung-Yeal;Chung Ki-Dong
    • The KIPS Transactions:PartC
    • /
    • v.13C no.1 s.104
    • /
    • pp.83-94
    • /
    • 2006
  • This paper describes ATM cell encipherment method using Rijndael Algorithm adopted as an AES(Advanced Encryption Standard) by NIST in 2001. ISO 9160 describes the requirement of physical layer data processing in encryption/decryption. For the description of ATM cell encipherment method, we implemented ATM data encipherment equipment which satisfies the requirements of ISO 9160, and verified the encipherment/decipherment processing at ATM STM-1 rate(155.52Mbps). The DES algorithm can process data in the block size of 64 bits and its key length is 64 bits, but the Rijndael algorithm can process data in the block size of 128 bits and the key length of 128, 192, or 256 bits selectively. So it is more flexible in high bit rate data processing and stronger in encription strength than DES. For tile real time encryption of high bit rate data stream. Rijndael algorithm was implemented in FPGA in this experiment. The boundary of serial UNI cell was detected by the CRC method, and in the case of user data cell the payload of 48 octets (384 bits) is converted in parallel and transferred to 3 Rijndael encipherment module in the block size of 128 bits individually. After completion of encryption, the header stored in buffer is attached to the enciphered payload and retransmitted in the format of cell. At the receiving end, the boundary of ceil is detected by the CRC method and the payload type is decided. n the payload type is the user data cell, the payload of the cell is transferred to the 3-Rijndael decryption module in the block sire of 128 bits for decryption of data. And in the case of maintenance cell, the payload is extracted without decryption processing.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.