• Title/Summary/Keyword: One-Class Classification

Search Result 350, Processing Time 0.025 seconds

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Investigating Non-Laboratory Variables to Predict Diabetic and Prediabetic Patients from Electronic Medical Records Using Machine Learning

  • Mukhtar, Hamid;Al Azwari, Sana
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.19-30
    • /
    • 2021
  • Diabetes Mellitus (DM) is one of common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of the medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient without the laboratory tests by performing screening based on some personal features can lessen the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic and prediabetic patients by considering factors other than the laboratory tests, as required by physicians in general. With the data obtained from local hospitals, medical records were processed to obtain a dataset that classified patients into three classes: diabetic, prediabetic, and non-diabetic. After applying three machine learning algorithms, we established good performance for accuracy, precision, and recall of the models on the dataset. Further analysis was performed on the data to identify important non-laboratory variables related to the patients for diabetes classification. The importance of five variables (gender, physical activity level, hypertension, BMI, and age) from the person's basic health data were investigated to find their contribution to the state of a patient being diabetic, prediabetic or normal. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing class-specific analysis of the disease, important factors specific to Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learnt from this research.

A Quantitative Analysis of Classification Classes and Classified Information Resources of Directory (디렉터리 서비스 분류항목 및 정보자원의 계량적 분석)

  • Kim, Sung-Won
    • Journal of Information Management
    • /
    • v.37 no.1
    • /
    • pp.83-103
    • /
    • 2006
  • This study analyzes the classification schemes and classified information resources of the directory services provided by major web portals to complement keyword-based retrieval. Specifically, this study intends to quantitatively analyze the topic categories, the information resources by subject, and the information resources classified by the topic categories of three directories, Yahoo, Naver, and Empas. The result of this analysis reveals some differences among directory services. Overall, these directories show different ratios of referred categories to original categories depending on the subject area, and the categories regarded as format-based show the highest proportion of referred categories. In terms of the total amount of classified information resources, Yahoo has the largest number of resources. The directories compared have different amounts of resources depending on the subject area. The quantitative analysis of resources classified by the specific category is performed on the class of 'News & Media'. The result reveals that Naver and Empas contain overly specified categories compared to Yahoo, as far as the number of information resources categorized is concerned. Comparing the depth of the categories assigned by the three directories to the same information resources, it is found that, on average, Yahoo assigns one-step further segmented divisions than the other two directories to the identical resources.

Data mining Algorithms for the Development of Sasang Type Diagnosis (사상체질 진단검사를 위한 데이터마이닝 알고리즘 연구)

  • Hong, Jin-Woo;Kim, Young-In;Park, So-Jung;Kim, Byoung-Chul;Eom, Il-Kyu;Hwang, Min-Woo;Shin, Sang-Woo;Kim, Byung-Joo;Kwon, Young-Kyu;Chae, Han
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.23 no.6
    • /
    • pp.1234-1240
    • /
    • 2009
  • This study was to compare the effectiveness and validity of various data-mining algorithm for Sasang type diagnostic test. We compared the sensitivity and specificity index of nine attribute selection and eleven class classification algorithms with 31 data-set characterizing Sasang typology and 10-fold validation methods installed in Waikato Environment Knowledge Analysis (WEKA). The highest classification validity score can be acquired as follows; 69.9 as Percentage Correctly Predicted index with Naive Bayes Classifier, 80 as sensitivity index with LWL/Tae-Eum type, 93.5 as specificity index with Naive Bayes Classifier/So-Eum type. The classification algorithm with highest PCP index of 69.62 after attribute selection was Naive Bayes Classifier. In this study we can find that the best-fit algorithm for traditional medicine is case sensitive and that characteristics of clinical circumstances, and data-mining algorithms and study purpose should be considered to get the highest validity even with the well defined data sets. It is also confirmed that we can't find one-fits-all algorithm and there should be many studies with trials and errors. This study will serve as a pivotal foundation for the development of medical instruments for Pattern Identification and Sasang type diagnosis on the basis of traditional Korean Medicine.

Machine Classification in Ship Engine Rooms Using Transfer Learning (전이 학습을 이용한 선박 기관실 기기의 분류에 관한 연구)

  • Park, Kyung-Min
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.2
    • /
    • pp.363-368
    • /
    • 2021
  • Ship engine rooms have improved automation systems owing to the advancement of technology. However, there are many variables at sea, such as wind, waves, vibration, and equipment aging, which cause loosening, cutting, and leakage, which are not measured by automated systems. There are cases in which only one engineer is available for patrolling. This entails many risk factors in the engine room, where rotating equipment is operating at high temperature and high pressure. When the engineer patrols, he uses his five senses, with particular high dependence on vision. We hereby present a preliminary study to implement an engine-room patrol robot that detects and informs the machine room while a robot patrols the engine room. Images of ship engine-room equipment were classified using a convolutional neural network (CNN). After constructing the image dataset of the ship engine room, the network was trained with a pre-trained CNN model. Classification performance of the trained model showed high reproducibility. Images were visualized with a class activation map. Although it cannot be generalized because the amount of data was limited, it is thought that if the data of each ship were learned through transfer learning, a model suitable for the characteristics of each ship could be constructed with little time and cost expenditure.

Mitral Valve Repair for Mitral Regurgitation (승모판막폐쇄부전에 대한 승모판막재건술)

  • 최세영;유영선;박기성;최대융;박창권;이광숙
    • Journal of Chest Surgery
    • /
    • v.31 no.3
    • /
    • pp.221-225
    • /
    • 1998
  • From February 1996 to May 1997, 18 patients underwent mitral valve repair for mitral regurgitation. There were 9 male and 9 female patients aged from 19 to 68 years(mean, 53). Thirteen patients were in New York Heart Association(NYHA) class III and IV. The cause of mitral regurgitation was degenerative in 12 patients, rheumatic in 5 patients and infective in 1 patient. Fifteen patients were in Carpentier's functional classification II, 2 patients in Carpentier's class III and 1 patient in Carpentier's class I. Surgical procedures included prosthetic ring annuloplasty(16 cases), rectangular resection of posterior leaflet(15 cases), chordal shortening(5 cases), triangular resection of anterior leaflet(2 cases), commissurotomy(2 cases), partial transposition of posterior leaflet(1 case). These procedures were combined in most patients. There was no operative death. These patients have been followed from 1 to 15 months, mean of 6.7 months. There was one late death resulted from low cardiac output following mitral valve replacement. The function of the repaired valve in other 17 patients has remained satisfactory during the observed interval. We consider that mitral valve repair is highly satisfactory in patients with mitral regurgitation.

  • PDF

Effects of an Inflowing Urban Stream (Wonju stream) on Epilithic Diatom Assemblages in the Lower Seom River (도시 하천(원주천) 유입이 섬강 하류 부착규조 군집에 미치는 영향)

  • Yoon, Sung-Ae;Kim, Nan-Young;Kim, Baik-Ho;Hwang, Soon-Jin
    • Korean Journal of Ecology and Environment
    • /
    • v.43 no.2
    • /
    • pp.232-241
    • /
    • 2010
  • Epilithic diatom communities and water quality were monitored to evaluate the ecological impact of the inflow of Wonju-stream passing through the urban area in the Seom River Watershed. We selected the 14 sampling stations (5 main stream sites and 9 tributary sites), and collected diatom and water samples between October 2007 and September 2008, on the seasonal basis. The results indicate that most water quality parameters showed the site-specific patterns over the study, except for water temperature and dissolved oxygen. The levels of water quality parameters were highest at the site of Wonju stream, whereas the lowest in the upstream sites, and intermediate or gradually decreased in the downstream sites of the Seom river. One species, Achnanthes convergens, showed the highest biomass and frequency over the sites, while three saprophilous species-Navicula goeppertiana, Navicula subminuscula, Nitzschia palea were appeared only in Wonju Stream and other polluted sites. According to trophic diatom index (TDI) values, which were highly correlated with nutrients and EC, the study sites were classified into three classes: upstream and tributary (Class A and B), Wonju Stream (Class D), and mixed zone and downstream (Class C). A cluster analysis supported the result of TDI classification. Therefore, Wonju-stream located in populated urban area exerted the adverse ecological effects on the epilithic diatom community and water quality of the lower Seom River System, although its severity gradually decreased downstream.

Applying Meta-model Formalization of Part-Whole Relationship to UML: Experiment on Classification of Aggregation and Composition (UML의 부분-전체 관계에 대한 메타모델 형식화 이론의 적용: 집합연관 및 복합연관 판별 실험)

  • Kim, Taekyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.99-118
    • /
    • 2015
  • Object-oriented programming languages have been widely selected for developing modern information systems. The use of concepts relating to object-oriented (OO, in short) programming has reduced efforts of reusing pre-existing codes, and the OO concepts have been proved to be a useful in interpreting system requirements. In line with this, we have witnessed that a modern conceptual modeling approach supports features of object-oriented programming. Unified Modeling Language or UML becomes one of de-facto standards for information system designers since the language provides a set of visual diagrams, comprehensive frameworks and flexible expressions. In a modeling process, UML users need to consider relationships between classes. Based on an explicit and clear representation of classes, the conceptual model from UML garners necessarily attributes and methods for guiding software engineers. Especially, identifying an association between a class of part and a class of whole is included in the standard grammar of UML. The representation of part-whole relationship is natural in a real world domain since many physical objects are perceived as part-whole relationship. In addition, even abstract concepts such as roles are easily identified by part-whole perception. It seems that a representation of part-whole in UML is reasonable and useful. However, it should be admitted that the use of UML is limited due to the lack of practical guidelines on how to identify a part-whole relationship and how to classify it into an aggregate- or a composite-association. Research efforts on developing the procedure knowledge is meaningful and timely in that misleading perception to part-whole relationship is hard to be filtered out in an initial conceptual modeling thus resulting in deterioration of system usability. The current method on identifying and classifying part-whole relationships is mainly counting on linguistic expression. This simple approach is rooted in the idea that a phrase of representing has-a constructs a par-whole perception between objects. If the relationship is strong, the association is classified as a composite association of part-whole relationship. In other cases, the relationship is an aggregate association. Admittedly, linguistic expressions contain clues for part-whole relationships; therefore, the approach is reasonable and cost-effective in general. Nevertheless, it does not cover concerns on accuracy and theoretical legitimacy. Research efforts on developing guidelines for part-whole identification and classification has not been accumulated sufficient achievements to solve this issue. The purpose of this study is to provide step-by-step guidelines for identifying and classifying part-whole relationships in the context of UML use. Based on the theoretical work on Meta-model Formalization, self-check forms that help conceptual modelers work on part-whole classes are developed. To evaluate the performance of suggested idea, an experiment approach was adopted. The findings show that UML users obtain better results with the guidelines based on Meta-model Formalization compared to a natural language classification scheme conventionally recommended by UML theorists. This study contributed to the stream of research effort about part-whole relationships by extending applicability of Meta-model Formalization. Compared to traditional approaches that target to establish criterion for evaluating a result of conceptual modeling, this study expands the scope to a process of modeling. Traditional theories on evaluation of part-whole relationship in the context of conceptual modeling aim to rule out incomplete or wrong representations. It is posed that qualification is still important; but, the lack of consideration on providing a practical alternative may reduce appropriateness of posterior inspection for modelers who want to reduce errors or misperceptions about part-whole identification and classification. The findings of this study can be further developed by introducing more comprehensive variables and real-world settings. In addition, it is highly recommended to replicate and extend the suggested idea of utilizing Meta-model formalization by creating different alternative forms of guidelines including plugins for integrated development environments.

Factors Affecting Health Promotion Behavior of Apheresis Blood-Donors (성분헌혈자의 건강증진행위에 영향을 미치는 요인)

  • Hong Kyong Hee;Park Ho Ran
    • Journal of Korean Public Health Nursing
    • /
    • v.19 no.1
    • /
    • pp.41-52
    • /
    • 2005
  • This study was designed to provide a base for nursing intervention to help apheresis blood-donors to perform health promotion behavior effectively by surveying their health promotion behavior and by analyzing the critical factors. The study subjects were 468 participants in platelet donation at a university hospital apheresis unit in Seoul. The data for this study were collected between May and June. 2002. by questionnaire. Data were analyzed by t-test, ANOVA. Scheffe test, Pearson correlation coefficient. and stepwise multiple regression. The results were as follows. 1. The degree of performance of health promotion behavior of the subjects was a total average score of $152.9\pm21.5$ points and a mean score of 2.7 points. The highest score was 'I have a good relationship with others' in the factor of self-actualization and interpersonal support. The lowest score was 'I have my blood pressure checked regularly' in the factor of health responsibility. 2. Considering the classification according to the subjects' general characteristics. the health promotion behavior score was significantly higher for soldiers than high school students, for religious believers than atheists. and for high class economic status than mid and low class economic status. Also the health promotion behavior score was higher for those who had made more than five blood donations than those who had made zero or one donation. and for those who had made more than four blood donations than for those who had made less than four blood donations in the previous times of apheresis blood donation. The score was also higher for those not having a relationship with recipient than those having a relationship. 3. The self-efficacy related to donation. general self-efficacy and self-esteem had a significant correlation with the performance in health promotion behavior. 4. The critical factors that influenced the health promotion behavior were explained by $35.6\%$ of the general self-efficacy and by $40.2\%$ of the total of self-efficacy related to donation, and previous times of apheresis blood donation. The health promotion behavior score of apheresis blood-donors differed according to job, religion, economic status, previous times of whole blood donation, previous times of apheresis blood donation, and relationship with recipient. The health promotion behavior and self-efficacy related to donation, general self-efficacy, and self-esteem showed significant positive correlation with one another. The general self-efficacy, self-efficacy related to donation, and previous times of apheresis blood donation appeared to be the significant predictive factors of health promotion behavior. Therefore, from these study results, it is necessary to establish more effective and organized nursing intervention strategies for the health promotion behavior of apheresis blood-donors.

  • PDF

Comparison Between Methods for Suitability Classification of Wild Edible Greens (산채류 재배적지 기준설정 방법 간의 비교 분석)

  • Hyun, Byung-Keun;Jung, Sug-Jae;Sonn, Yeon-Kyu;Park, Chan-Won;Zhang, Young-Seon;Song, Kwan-Cheol;Kim, Lee-Hyun;Choi, Eun-Young;Hong, Suk-Young;Kwon, Sun-Ik;Jang, Byoung-Choon
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.5
    • /
    • pp.696-704
    • /
    • 2010
  • The objective of this study was analysis of two methods of land suitability classification for wild edible green. One method was Maximum limiting factor method (MLFM) and the other was Multi-regression method (MRM) for land suitability classification for wild edible green. The investigation was carried out in Pyeongchang, Hongcheong, Hoeingseong, and Yanggu regions in Korea. The obtained results showed that factors related to the decision classification of the land suitability for wild edible green cultivation were land slope, altitude, soil morphology and gravel contents so on. The classification of the best suitability soil for wild edible greens were fine loamy (silty), valley or fan of soil morphology, well drainage class, B-slope (2~7%), available soil depth deeper than 100cm, and altitude higher than 501m. Contribution of soil that influence to crop yields using Multi-regression method were slope 0.30, altitude 0.22, soil morphology 0.13, drainage classes 0.09, available soil depth 0.07, and soil texture 0.01 orders. Using MLFM, area of best suitable land was 0.2%, suitable soil 15.0%, possible soil 16.7%, and low productive soil 68.0% in Hongcheon region of Gangwon province. But, area of best suitable land was 35.1%, suitable soil 30.7%, possible soil 10.3%, and low productive soil 23.9% by MRM. There was big difference of suitable soil area between two methods (MLFM and MRM). When decision classificatin of the land suitability for wild edible green cultivation should consider enough analysis methods. Furthermore, to establishment of land suitability classification for crop would be better use MRM than MLFM.