• Title/Summary/Keyword: Statistical classification

Search Result 1,419, Processing Time 0.034 seconds

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

Study on the Chemical Management - 2. Comparison of Classification and Health Index of Chemicals Regulated by the Ministry of Environment and the Ministry of the Employment and Labor (화학물질 관리 연구-2. 환경부와 고용노동부의 관리 화학물질의 구분, 노출기준 및 독성 지표 등의 특성 비교)

  • Kim, Sunju;Yoon, Chungsik;Ham, Seunghon;Park, Jihoon;Kim, Songha;Kim, Yuna;Lee, Jieun;Lee, Sangah;Park, Donguk;Lee, Kwonseob;Ha, Kwonchul
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.25 no.1
    • /
    • pp.58-71
    • /
    • 2015
  • Objectives: The aims of this study were to investigate the classification system of chemical substances in the Occupational Safety and Health Act(OSHA) and Chemical Substances Control Act(CSCA) and to compare several health indices (i.e., Time Weighted Average (TWA), Lethal Dose ($LD_{50}$), and Lethal Concentration ($LC_{50}$) of chemical substances by categories in each law. Methods: The chemicals regulated by each law were classified by the specific categories provided in the respective law; seven categories for OSHA (chemicals with OELs, chemicals prohibited from manufacturing, etc., chemicals requiring approval, chemicals kept below permissible limits, chemicals requiring workplace monitoring, chemicals requiring special management, and chemicals requiring special heath diagnosis) and five categories from the CSCA(poisonous substances, permitted substances, restricted substances, prohibited substances, and substances requiring preparation for accidents). Information on physicochemical properties, health indices including CMR characteristics, $LD_{50}$ and $LD_{50}$ were searched from the homepages of the Korean Occupational and Safety Agency and the National Institute of Environmental Research, etc. Statistical analysis was conducted for comparison between TWA and health index for each category. Results: The number of chemicals based on CAS numbers was different from the numbers of series of chemicals listed in each law because of repeat listings due to different names (e.g., glycol monoethylether vs. 2-ethoxy ethanol) and grouping of different chemicals under the same serial number(i.e., five different benzidine-related chemicals were categorized under one serial number(06-4-13) as prohibited substances under the CSCA). A total of 722 chemicals and 995 chemicals were listed at the OSHA and its sub-regulations and CSCA and its sub-regulations, respectively. Among these, 36.8% based on OSHA chemicals and 26.7% based on CSCA chemicals were regulated simultaneously through both laws. The correlation coefficients between TWA and $LC_{50}$ and between TWA and $LD_{50}$, were 0.641 and 0.506, respectively. The geometric mean values of TWA calculated by each category in both laws have no tendency according to category. The patterns of cumulative graph for TWA, $LD_{50}$, $LC_{50}$ were similar to the chemicals regulated by OHSA and CCSA, but their median values were lower for CCSA regulated chemicals than OSHA regulated chemicals. The GM of carcinogenic chemicals under the OSHA was significantly lower than non-CMR chemicals($2.21mg/m^3$ vs $5.69mg/m^3$, p=0.006), while there was no significant difference in CSCA chemicals($0.85mg/m^3$ vs $1.04mg/m^3$, p=0.448). $LC_{50}$ showed no significant difference between carcinogens, mutagens, reproductive toxic chemicals and non-CMR chemicals in both laws' regulated chemicals, while there was a difference between carcinogens and non-CMR chemicals in $LD_{50}$ of the CSCA. Conclusions: This study found that there was no specific tendency or significant difference in health indicessuch TWA, $LD_{50}$ and $LC_{50}$ in subcategories of chemicals as classified by the Ministry of Labor and Employment and the Ministry of Environment. Considering the background and the purpose of each law, collaboration for harmonization in chemical categorizing and regulation is necessary.

Analysis of the Naemorhedus caudatus Population in Odaesan National Park - The Goral Individually Identification and Statistical Analysis Using the Sensor Camera - (오대산국립공원 산양(Naemorhedus caudatus) 개체 수 분석 - 무인센서카메라 분석을 이용한 개체 구분 및 통계 분석 -)

  • Kim, Gyu-cheol;Lee, Yong-hak;Lee, Dong-un;Son, Jang-ick;Kang, Jae-gu;Cho, Chea-un
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.1
    • /
    • pp.1-8
    • /
    • 2020
  • This study conducted a full survey of the goral population using sensor cameras to identify the exact habitat of the gorals that inhabit Odaesan National Park and for restoration and habitat management-focused conservation projects following the population growth. We surveyed Odaesan National Park for a year in 2018 and selected18 grids (2km×2km) first based on the survey results. We then further divided each grid into four small grids (1km×1km) and installed a total of 62 sensor cameras in 38 small girds divided by four grids(1km×1km). The survey resulted in a total of 5,096 photographed wild animals, 2,268 of which were gorals, and the analysis by the classification table of goral (horn shape (Ⓐ), ring pattern (Ⓑ), ring formation ratio (Ⓒ), and facial color (Ⓓ)) identified a total of 95 animals. The ratio of male and female was 35 males (36.8%), 46 females (48.4%), and 14 sex unknowns (14.7%), while the ratio of female and male excluding sex unknowns was 4 (male):6 (female). The horn shape (Ⓐ) and face color (Ⓓ) were the important factors for distinguishing male and female and identifying individuals. The analysis of the correlation of 81 individuals, excluding 14 individuals of unknown sex, showed a significant difference (r=-0.635, p<0.01). Since the goral population in Odaesan National Park has reached a minimum viable population, it is necessary to change the focus of the management policy of Odaesan National Park from restoration to conservation.

A study of quality of working life to dental hygienist's (치과위생사의 근로생활의 질(QWL)에 관한 연구)

  • Oh, Hye-Seung;Kim, Eun-Hee
    • Journal of Korean society of Dental Hygiene
    • /
    • v.10 no.2
    • /
    • pp.375-392
    • /
    • 2010
  • Objectives : Dental hygienist's work satisfaction and stress affect the overall quality of work life(QWL). Therefore, this research is intended to suggest fundamental data to improve QWL by finding out characteristics of each work satisfaction and stress element. To this end, a total of 327 dental hygienists working at general hospitals, university hospitals, dental hospitals and dental clinics across Seoul, Gyeonggi and Incheon were surveyed. Results of survey are as follows. Methods : The collected data were analyzed by using an SPSS 12.0 statistical program, obtaining the following results. The collected data conducted a questionnaire survey for 327 dental hygienists who work at the hospitals, university hospitals, dental hospitals, and dental clinics located at Seoul, Gyeonggi-do, and Incheon district from January until March, 2009, and drew the conclusions as follows. Result : 1. Demographic characteristics, income from 1.5 to 1.99 million were the whole lot, more than 2 million to less than 1.5 million was similar. Marital status Married Unmarried higher than the atheist religion, Christianity, Catholicism, Buddhism, and other, respectively. Classification by level of education in the college graduate, university graduate, graduate diploma, respectively. 2. Are working in a job-related characteristics of dentistry, dental hospital, general and university hospital, respectively. The making in position, Mount, contractor, responsible, senior, was an intern in the order. The five-day workweek whether working at night and is not going to care whether the conduct was similar. Classification of working hours and 8 hours, 8 hours, 8 hours or less orderly, and total of less than 1-3 years of clinical experience, 5 years, less than one year, less than 3-5 years, respectively. 3. There comes out a significant difference according to age, income, position, gross clinical experience, and whether to put night shift into practice in job stability in terms of the quality subsequent to general characteristics(p<.05). 4. There comes out a significant difference according to marital status, one's place of work, position, whether to put a five-day workweek into practice in work environment and benefits package in terms of the quality subsequent to general characteristics (p<.05). 5. There comes out a significant difference according to age, marital status, income, position, and gross clinical experience in education & training and benefits packages in terms of the quality subsequent to general characteristics(p<.05). 6. There comes out a significant difference according to whether to put night medical treatment into practice in social usefulness in terms of the quality subsequent to general characteristics(p<.05). 7. There comes out a significant difference according to marital status, income, one's place of work, gross clinical experience, work hours, and whether to put a five-day workweek into practice in leisure activity in terms of the quality subsequent to general characteristics(p<.05). 8. There comes out a significant difference according to income, one's place of work, and position in wage level in terms of the quality subsequent to general characteristics(p<.05). 9. There was no significant difference in all items related to human relations and free communication in terms of the quality subsequent to general characteristics(p>.05). Conclusions : It is necessary to analyze factors related to work satisfaction and stress in order to improve dental hygienist's quality of work life. Hospitals must support them systematically and institutionally and related organizations must conduct practical research.

The influence of wearing helmet and cervical spine injury in skiers and snowboarders (스키와 스노우 보드에서 헬멧의 착용이 경추부 손상에 미치는 영향)

  • Kim, Sung Hun;Kim, Tae Kyun;Chun, Keun Churl;Hwang, Jae Sun
    • Journal of Korean Orthopaedic Sports Medicine
    • /
    • v.10 no.2
    • /
    • pp.94-99
    • /
    • 2011
  • Purpose: As the number of people enjoying skiing and snowboarding which are two popular winter sports has been increasing, wearing helmet during doing these sports has been needed for safety. The rates of head or face injury have decreased after using helmet. However the effect that wearing helmet has on cervical damage is not yet to be known. So through this research we intend to be helpful in developing effective program and safety equipment. Materials and Methods: During two seasons from December 2009 to march 2011, cased 658 cervical injuries within 14538 admittance in medical center of major resort due to skiing and snow-boarding injuries. For survey and research model, one year before the research year conducted a pilot study. Admittance were 432 male and 226 female, advanced 273 and 385 novice. We divided them into two groups depending on wearing helmet, measured cervical damage ratio and injury mechanism, and researched the severity of damage and diagnosed injury. Each group used SPSS 12.0 (SPSS Inc., Chicago, IL, USA) to process data statistically. Results: The number of patients was 312 in skier and 346 in snow boarder. Patients wearing helmet were 146 in skier and 127 in snow boarder. Classification of each injuries were confirmed as 292 cases of simple sprain, 359 bruising, 6 cervical fractures and 1 case of dislocation. Classification of injury mechanisms were 287 of human collision, 212 material collision, 108 of slip down by oneself, 39 of falling and 12 cases of etc. In cases wearing helmet ski 78/ snow board 70 were simple sprain, ski 64/ snowboard 68 were shown as bruising, ski 1/ snow board 2 had cervical fracture or dislocation. The ratio of cervical sprain increased in cases of wearing helmet compared to non-wearing cases and there was a statistical significance (p<0.001). The ratio of cervical contusion increased significantly in non-wearing helmet user (p<0.05). However, there was no significant increase in fracture and dislocation compared between helmet user and non-user (p> 0.05). Conclusion: In this study, wearing helmet had no relation to additional cervical injury occurrence or severity among skiers and snow boarders. The ratio of cervical sprain increased significantly in helmet user with person to person accident. However, the cervical contusion decreased. On this ground, further biomechanical studies are required and modified helmet will be necessary.

  • PDF

A Retrospective Analysis of the Relationship Between Rotator Cuff Tear and Biceps Lesion (후하방 회전근 개 파열과 상완이두박근 장두건 병변과의 연관 관계에 대한 후향적 분석)

  • Seo, Seung-Suk;Kim, Jung-Han;Choi, Jang-Seok;Kim, Jeon-Gyo
    • Clinics in Shoulder and Elbow
    • /
    • v.14 no.1
    • /
    • pp.13-19
    • /
    • 2011
  • Purpose: Not much is known about the obvious relationship between posteroinferior rotator cuff tear and biceps lesion. The purpose of this study is to analyze the effect of posteroinferior rotator cuff tear on a biceps lesions by comparing the rotator cuff tear and biceps lesions with the number of cuff tears and the degree of degeneration of the rotator cuff. Materials and Methods: 65 patients who underwent surgery for a posteroinferior rotator cuff tear from 2002 to 2009 were included as subjects. The study determined the factors (the number of cuff tears and the degree of degeneration as assessed by MRI) that affected biceps lesions and the kinematic stability of the rotator cuff. Results: Biceps lesion was noted 11 patients among the 51 patients with supraspinatus tendon tears and in 8 patients among the 14 patients with supraspinatus, infraspinatus or teres minor tendon tears, and there was a statistically significant difference between those two groups (p=0.0095). The number of cuff tears was proportional to biceps lesion with statistical significance (p=0.0095). Among the biceps lesions, SLAP II lesion showed a statistically different distribution according to the number of cuff tears (p=0.0073). The degeneration factors (Goutallier's classification and the tangent sign) had no correlations with biceps lesion. Conclusion: Posterosuperior cuff tear may affect biceps lesion. Especially, the number of cuff tears has a close relationship, but the degenerative indicators have no relationship with biceps lesion.

Research about feature selection that use heuristic function (휴리스틱 함수를 이용한 feature selection에 관한 연구)

  • Hong, Seok-Mi;Jung, Kyung-Sook;Chung, Tae-Choong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.281-286
    • /
    • 2003
  • A large number of features are collected for problem solving in real life, but to utilize ail the features collected would be difficult. It is not so easy to collect of correct data about all features. In case it takes advantage of all collected data to learn, complicated learning model is created and good performance result can't get. Also exist interrelationships or hierarchical relations among the features. We can reduce feature's number analyzing relation among the features using heuristic knowledge or statistical method. Heuristic technique refers to learning through repetitive trial and errors and experience. Experts can approach to relevant problem domain through opinion collection process by experience. These properties can be utilized to reduce the number of feature used in learning. Experts generate a new feature (highly abstract) using raw data. This paper describes machine learning model that reduce the number of features used in learning using heuristic function and use abstracted feature by neural network's input value. We have applied this model to the win/lose prediction in pro-baseball games. The result shows the model mixing two techniques not only reduces the complexity of the neural network model but also significantly improves the classification accuracy than when neural network and heuristic model are used separately.

Detection of Phantom Transaction using Data Mining: The Case of Agricultural Product Wholesale Market (데이터마이닝을 이용한 허위거래 예측 모형: 농산물 도매시장 사례)

  • Lee, Seon Ah;Chang, Namsik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.161-177
    • /
    • 2015
  • With the rapid evolution of technology, the size, number, and the type of databases has increased concomitantly, so data mining approaches face many challenging applications from databases. One such application is discovery of fraud patterns from agricultural product wholesale transaction instances. The agricultural product wholesale market in Korea is huge, and vast numbers of transactions have been made every day. The demand for agricultural products continues to grow, and the use of electronic auction systems raises the efficiency of operations of wholesale market. Certainly, the number of unusual transactions is also assumed to be increased in proportion to the trading amount, where an unusual transaction is often the first sign of fraud. However, it is very difficult to identify and detect these transactions and the corresponding fraud occurred in agricultural product wholesale market because the types of fraud are more intelligent than ever before. The fraud can be detected by verifying the overall transaction records manually, but it requires significant amount of human resources, and ultimately is not a practical approach. Frauds also can be revealed by victim's report or complaint. But there are usually no victims in the agricultural product wholesale frauds because they are committed by collusion of an auction company and an intermediary wholesaler. Nevertheless, it is required to monitor transaction records continuously and to make an effort to prevent any fraud, because the fraud not only disturbs the fair trade order of the market but also reduces the credibility of the market rapidly. Applying data mining to such an environment is very useful since it can discover unknown fraud patterns or features from a large volume of transaction data properly. The objective of this research is to empirically investigate the factors necessary to detect fraud transactions in an agricultural product wholesale market by developing a data mining based fraud detection model. One of major frauds is the phantom transaction, which is a colluding transaction by the seller(auction company or forwarder) and buyer(intermediary wholesaler) to commit the fraud transaction. They pretend to fulfill the transaction by recording false data in the online transaction processing system without actually selling products, and the seller receives money from the buyer. This leads to the overstatement of sales performance and illegal money transfers, which reduces the credibility of market. This paper reviews the environment of wholesale market such as types of transactions, roles of participants of the market, and various types and characteristics of frauds, and introduces the whole process of developing the phantom transaction detection model. The process consists of the following 4 modules: (1) Data cleaning and standardization (2) Statistical data analysis such as distribution and correlation analysis, (3) Construction of classification model using decision-tree induction approach, (4) Verification of the model in terms of hit ratio. We collected real data from 6 associations of agricultural producers in metropolitan markets. Final model with a decision-tree induction approach revealed that monthly average trading price of item offered by forwarders is a key variable in detecting the phantom transaction. The verification procedure also confirmed the suitability of the results. However, even though the performance of the results of this research is satisfactory, sensitive issues are still remained for improving classification accuracy and conciseness of rules. One such issue is the robustness of data mining model. Data mining is very much data-oriented, so data mining models tend to be very sensitive to changes of data or situations. Thus, it is evident that this non-robustness of data mining model requires continuous remodeling as data or situation changes. We hope that this paper suggest valuable guideline to organizations and companies that consider introducing or constructing a fraud detection model in the future.