• Title/Summary/Keyword: One-Class Classification

Search Result 353, Processing Time 0.043 seconds

Applying Meta-model Formalization of Part-Whole Relationship to UML: Experiment on Classification of Aggregation and Composition (UML의 부분-전체 관계에 대한 메타모델 형식화 이론의 적용: 집합연관 및 복합연관 판별 실험)

  • Kim, Taekyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.99-118
    • /
    • 2015
  • Object-oriented programming languages have been widely selected for developing modern information systems. The use of concepts relating to object-oriented (OO, in short) programming has reduced efforts of reusing pre-existing codes, and the OO concepts have been proved to be a useful in interpreting system requirements. In line with this, we have witnessed that a modern conceptual modeling approach supports features of object-oriented programming. Unified Modeling Language or UML becomes one of de-facto standards for information system designers since the language provides a set of visual diagrams, comprehensive frameworks and flexible expressions. In a modeling process, UML users need to consider relationships between classes. Based on an explicit and clear representation of classes, the conceptual model from UML garners necessarily attributes and methods for guiding software engineers. Especially, identifying an association between a class of part and a class of whole is included in the standard grammar of UML. The representation of part-whole relationship is natural in a real world domain since many physical objects are perceived as part-whole relationship. In addition, even abstract concepts such as roles are easily identified by part-whole perception. It seems that a representation of part-whole in UML is reasonable and useful. However, it should be admitted that the use of UML is limited due to the lack of practical guidelines on how to identify a part-whole relationship and how to classify it into an aggregate- or a composite-association. Research efforts on developing the procedure knowledge is meaningful and timely in that misleading perception to part-whole relationship is hard to be filtered out in an initial conceptual modeling thus resulting in deterioration of system usability. The current method on identifying and classifying part-whole relationships is mainly counting on linguistic expression. This simple approach is rooted in the idea that a phrase of representing has-a constructs a par-whole perception between objects. If the relationship is strong, the association is classified as a composite association of part-whole relationship. In other cases, the relationship is an aggregate association. Admittedly, linguistic expressions contain clues for part-whole relationships; therefore, the approach is reasonable and cost-effective in general. Nevertheless, it does not cover concerns on accuracy and theoretical legitimacy. Research efforts on developing guidelines for part-whole identification and classification has not been accumulated sufficient achievements to solve this issue. The purpose of this study is to provide step-by-step guidelines for identifying and classifying part-whole relationships in the context of UML use. Based on the theoretical work on Meta-model Formalization, self-check forms that help conceptual modelers work on part-whole classes are developed. To evaluate the performance of suggested idea, an experiment approach was adopted. The findings show that UML users obtain better results with the guidelines based on Meta-model Formalization compared to a natural language classification scheme conventionally recommended by UML theorists. This study contributed to the stream of research effort about part-whole relationships by extending applicability of Meta-model Formalization. Compared to traditional approaches that target to establish criterion for evaluating a result of conceptual modeling, this study expands the scope to a process of modeling. Traditional theories on evaluation of part-whole relationship in the context of conceptual modeling aim to rule out incomplete or wrong representations. It is posed that qualification is still important; but, the lack of consideration on providing a practical alternative may reduce appropriateness of posterior inspection for modelers who want to reduce errors or misperceptions about part-whole identification and classification. The findings of this study can be further developed by introducing more comprehensive variables and real-world settings. In addition, it is highly recommended to replicate and extend the suggested idea of utilizing Meta-model formalization by creating different alternative forms of guidelines including plugins for integrated development environments.

Factors Affecting Health Promotion Behavior of Apheresis Blood-Donors (성분헌혈자의 건강증진행위에 영향을 미치는 요인)

  • Hong Kyong Hee;Park Ho Ran
    • Journal of Korean Public Health Nursing
    • /
    • v.19 no.1
    • /
    • pp.41-52
    • /
    • 2005
  • This study was designed to provide a base for nursing intervention to help apheresis blood-donors to perform health promotion behavior effectively by surveying their health promotion behavior and by analyzing the critical factors. The study subjects were 468 participants in platelet donation at a university hospital apheresis unit in Seoul. The data for this study were collected between May and June. 2002. by questionnaire. Data were analyzed by t-test, ANOVA. Scheffe test, Pearson correlation coefficient. and stepwise multiple regression. The results were as follows. 1. The degree of performance of health promotion behavior of the subjects was a total average score of $152.9\pm21.5$ points and a mean score of 2.7 points. The highest score was 'I have a good relationship with others' in the factor of self-actualization and interpersonal support. The lowest score was 'I have my blood pressure checked regularly' in the factor of health responsibility. 2. Considering the classification according to the subjects' general characteristics. the health promotion behavior score was significantly higher for soldiers than high school students, for religious believers than atheists. and for high class economic status than mid and low class economic status. Also the health promotion behavior score was higher for those who had made more than five blood donations than those who had made zero or one donation. and for those who had made more than four blood donations than for those who had made less than four blood donations in the previous times of apheresis blood donation. The score was also higher for those not having a relationship with recipient than those having a relationship. 3. The self-efficacy related to donation. general self-efficacy and self-esteem had a significant correlation with the performance in health promotion behavior. 4. The critical factors that influenced the health promotion behavior were explained by $35.6\%$ of the general self-efficacy and by $40.2\%$ of the total of self-efficacy related to donation, and previous times of apheresis blood donation. The health promotion behavior score of apheresis blood-donors differed according to job, religion, economic status, previous times of whole blood donation, previous times of apheresis blood donation, and relationship with recipient. The health promotion behavior and self-efficacy related to donation, general self-efficacy, and self-esteem showed significant positive correlation with one another. The general self-efficacy, self-efficacy related to donation, and previous times of apheresis blood donation appeared to be the significant predictive factors of health promotion behavior. Therefore, from these study results, it is necessary to establish more effective and organized nursing intervention strategies for the health promotion behavior of apheresis blood-donors.

  • PDF

Comparison Between Methods for Suitability Classification of Wild Edible Greens (산채류 재배적지 기준설정 방법 간의 비교 분석)

  • Hyun, Byung-Keun;Jung, Sug-Jae;Sonn, Yeon-Kyu;Park, Chan-Won;Zhang, Young-Seon;Song, Kwan-Cheol;Kim, Lee-Hyun;Choi, Eun-Young;Hong, Suk-Young;Kwon, Sun-Ik;Jang, Byoung-Choon
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.5
    • /
    • pp.696-704
    • /
    • 2010
  • The objective of this study was analysis of two methods of land suitability classification for wild edible green. One method was Maximum limiting factor method (MLFM) and the other was Multi-regression method (MRM) for land suitability classification for wild edible green. The investigation was carried out in Pyeongchang, Hongcheong, Hoeingseong, and Yanggu regions in Korea. The obtained results showed that factors related to the decision classification of the land suitability for wild edible green cultivation were land slope, altitude, soil morphology and gravel contents so on. The classification of the best suitability soil for wild edible greens were fine loamy (silty), valley or fan of soil morphology, well drainage class, B-slope (2~7%), available soil depth deeper than 100cm, and altitude higher than 501m. Contribution of soil that influence to crop yields using Multi-regression method were slope 0.30, altitude 0.22, soil morphology 0.13, drainage classes 0.09, available soil depth 0.07, and soil texture 0.01 orders. Using MLFM, area of best suitable land was 0.2%, suitable soil 15.0%, possible soil 16.7%, and low productive soil 68.0% in Hongcheon region of Gangwon province. But, area of best suitable land was 35.1%, suitable soil 30.7%, possible soil 10.3%, and low productive soil 23.9% by MRM. There was big difference of suitable soil area between two methods (MLFM and MRM). When decision classificatin of the land suitability for wild edible green cultivation should consider enough analysis methods. Furthermore, to establishment of land suitability classification for crop would be better use MRM than MLFM.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

Detection of Traffic Anomalities using Mining : An Empirical Approach (마이닝을 이용한 이상트래픽 탐지: 사례 분석을 통한 접근)

  • Kim Jung-Hyun;Ahn Soo-Han;Won You-Jip;Lee Jong-Moon;Lee Eun-Young
    • Journal of KIISE:Information Networking
    • /
    • v.33 no.3
    • /
    • pp.201-217
    • /
    • 2006
  • In this paper, we collected the physical traces from high speed Internet backbone traffic and analyze the various characteristics of the underlying packet traces. Particularly, our work is focused on analyzing the characteristics of an anomalous traffic. It is found that in our data, the anomalous traffic is caused by UDP session traffic and we determined that it was one of the Denial of Service attacks. In this work, we adopted the unsupervised machine learning algorithm to classify the network flows. We apply the k-means clustering algorithm to train the learner. Via the Cramer-Yon-Misses test, we confirmed that the proposed classification method which is able to detect anomalous traffic within 1 second can accurately predict the class of a flow and can be effectively used in determining the anomalous flows.

Genetic Association Analysis of Fasting and 1- and 2-Hour Glucose Tolerance Test Data Using a Generalized Index of Dissimilarity Measure for the Korean Population

  • Yee, Jaeyong;Kim, Yongkang;Park, Taesung;Park, Mira
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.181-186
    • /
    • 2016
  • Glucose tolerance tests have been devised to determine the speed of blood glucose clearance. Diabetes is often tested with the standard oral glucose tolerance test (OGTT), along with fasting glucose level. However, no single test may be sufficient for the diagnosis, and the World Health Organization (WHO)/International Diabetes Federation (IDF) has suggested composite criteria. Accordingly, a single multi-class trait was constructed with three of the fasting phenotypes and 1- and 2-hour OGTT phenotypes from the Korean Association Resource (KARE) project, and the genetic association was investigated. All of the 18 possible combinations made out of the 3 sets of classification for the individual phenotypes were taken into our analysis. These were possible due to a method that was recently developed by us for estimating genomic associations using a generalized index of dissimilarity. Eight single-nucleotide polymorphisms (SNPs) that were found to have the strongest main effect are reported with the corresponding genes. Four of them conform to previous reports, located in the CDKAL1 gene, while the other 4 SNPs are new findings. Two-order interacting SNP pairs of are also presented. One pair (rs2328549 and rs6486740) has a prominent association, where the two single-nucleotide polymorphism locations are CDKAL1 and GLT1D1. The latter has not been found to have a strong main effect. New findings may result from the proper construction and analysis of a composite trait.

Constructing Gene Regulatory Networks using Frequent Gene Expression Pattern and Chain Rules (빈발 유전자 발현 패턴과 연쇄 규칙을 이용한 유전자 조절 네트워크 구축)

  • Lee, Heon-Gyu;Ryu, Keun-Ho;Joung, Doo-Young
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.9-20
    • /
    • 2007
  • Groups of genes control the functioning of a cell by complex interactions. Such interactions of gene groups are tailed Gene Regulatory Networks(GRNs). Two previous data mining approaches, clustering and classification, have been used to analyze gene expression data. Though these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rules. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and gene expression patterns we detected by applying the FP-growth algorithm. Next, we construct a gene regulatory network from frequent gene patterns using chain rules. Finally, we validate our proposed method through our experimental results, which are consistent with published results.

Strength Assessment of 8m-class High-Speed Planing Leisure Boat (8m급 고속 활주선형 레저보트의 구조강도 평가)

  • Ko, Dae-Eun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.418-423
    • /
    • 2018
  • Recently, research and development of high-value leisure vessels has been carried out in Korea to revitalize the marine leisure industry and tap into the global maritime leisure market. FRP composite materials, which have excellent physical properties and are available for the manufacture of light hulls, are used widely. One of the most important design technologies is to secure structural safety of leisure vessels made from FRP composite materials. In this study, the structural strength was assessed for the design of an 8-meter high-speed planing leisure boat made from FRP composite materials. The design loads to verify the structural safety were calculated according to the rules for the classification of high speed light craft (KR, 2015), and structural analysis was conducted using a finite element model composed of an isotropic shell element, which has equivalent bending rigidity with the FRP sandwich panel. The analysis results were compared with the results of the strength test for fabricated specimens, and all internal structural components are sufficiently satisfied with the structural strength.

A Classifier Capable of Handling Incomplete Data Set (불완전한 데이터를 처리할수 있는 분류기)

  • Lee, Jong-Chan;Lee, Won-Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.53-62
    • /
    • 2010
  • This paper introduces a classification algorithm which can be applied to a learning problem with incomplete data sets, missing variable values or a class value. This algorithm uses a data expansion method which utilizes weighted values and probability techniques. It operates by extending a classifier which are considered to be in the optimal projection plane based on Fisher's formula. To do this, some equations are derived from the procedure to be applied to the data expansion. To evaluate the performance of the proposed algorithm, results of different measurements are iteratively compared by choosing one variable in the data set and then modifying the rate of missing and non-missing values in this selected variable. And objective evaluation of data sets can be achieved by comparing, the result of a data set with non-missing variable with that of C4.5 which is a known knowledge acquisition tool in machine learning.