• Title/Summary/Keyword: classification learning

Search Result 3,347, Processing Time 0.051 seconds

Convergence Analysis of Risk factors for Readmission in Cardiovascular Disease: A Machine Learning Approach (의사결정나무분석을 이용한 심혈관질환자의 재입원 위험 요인에 대한 융합적 분석)

  • Kim, Hyun-Su
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.12
    • /
    • pp.115-123
    • /
    • 2019
  • This is descriptive study to 2nd analysis data KNHANES IV-VI about risk factors of readmission among patients with cardiovascular disease. Among the total 65,973 adults, 1,037 with angina or myocardial infarction were analyzed. The analysis was conducted using SPSS window 21 Program and CHAID decision tree was used in the classification analysis. Root nodes are economic activity(χ2=12.063, p=.001), children's nodes are personal income(χ2=6.575, p=.031), weight change(χ2=12.758, p=.001), residential area(χ2=4.025, p=.045), direct smoking(χ2=3.884, p=.031). p=.049), level of education(χ2=9.630, p=.024). Terminal nodes are hypertension(χ2=3.854, p=.050), diabetes mellitus(χ2=6.056, p=.014), occupation type(χ2=7.799, p=.037). We suggest that the development and operation of programs considering the integrated approach of various factors is necessary for the readmission management of cardiovascular patients.

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.

A Study on Similar Trademark Search Model Using Convolutional Neural Networks (합성곱 신경망(Convolutional Neural Network)을 활용한 지능형 유사상표 검색 모형 개발)

  • Yoon, Jae-Woong;Lee, Suk-Jun;Song, Chil-Yong;Kim, Yeon-Sik;Jung, Mi-Young;Jeong, Sang-Il
    • Management & Information Systems Review
    • /
    • v.38 no.3
    • /
    • pp.55-80
    • /
    • 2019
  • Recently, many companies improving their management performance by building a powerful brand value which is recognized for trademark rights. However, as growing up the size of online commerce market, the infringement of trademark rights is increasing. According to various studies and reports, cases of foreign and domestic companies infringing on their trademark rights are increased. As the manpower and the cost required for the protection of trademark are enormous, small and medium enterprises(SMEs) could not conduct preliminary investigations to protect their trademark rights. Besides, due to the trademark image search service does not exist, many domestic companies have a problem that investigating huge amounts of trademarks manually when conducting preliminary investigations to protect their rights of trademark. Therefore, we develop an intelligent similar trademark search model to reduce the manpower and cost for preliminary investigation. To measure the performance of the model which is developed in this study, test data selected by intellectual property experts was used, and the performance of ResNet V1 101 was the highest. The significance of this study is as follows. The experimental results empirically demonstrate that the image classification algorithm shows high performance not only object recognition but also image retrieval. Since the model that developed in this study was learned through actual trademark image data, it is expected that it can be applied in the real industrial environment.

A Study on the Correlations between the Physical Characteristics of Rock Types by Multiple Regression Analysis and Artificial Neural Network (다중회귀분석 및 인공신경망을 통한 암종별 물리적 특성간의 상관관계에 대한 연구)

  • Kim, Byong-Kuk;Lee, Byok-Kyu;Jang, Seung-Jin;Lee, Su-Gon
    • The Journal of Engineering Geology
    • /
    • v.28 no.4
    • /
    • pp.673-686
    • /
    • 2018
  • The physical properties of rocks constituting the rock mass were analyzed by using various methods such as 7 kinds of physical properties of about 2,400 data. The correlation equation was derived from the correlation equation with the dependent variables by screening independent variables through the significance level using multiple regression analysis. In order to verify the reliability of this equation, verification was performed through comparison with actual data using artificial neural network learning. The analysis results by petrogenesis and strength confirmed that the elastic wave velocity (compressional wave) and elastic modulus as the main influence factors for the independent variables affecting the dependent variables. This proves that most of the correlation equations using the above items are found in existing studies. And through this study, it is confirmed whether the rock classification is based on the above items in various standards. In addition, the analysis results of representative rocks showed a high correlation as the equation for estimating unconfined compressive strength and elastic modulus exceeds the coefficient of determination 0.8.

Predicting The Direction of The Daily KOSPI Movement Using Neural Networks For ETF Trades (신경회로망을 이용한 일별 KOSPI 이동 방향 예측에 의한 ETF 매매)

  • Hwang, Heesoo
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.4
    • /
    • pp.1-6
    • /
    • 2019
  • Neural networks have been used to predict the direction of stock index movement from past data. The conventional research that predicts the upward or downward movement of the stock index predicts a rise or fall even with small changes in the index. It is highly likely that losses will occur when trading ETFs by use of the prediction. In this paper, a neural network model that predicts the movement direction of the daily KOrea composite Stock Price Index (KOSPI) to reduce ETF trading losses and earn more than a certain amount per trading is presented. The proposed model has outputs that represent rising (change rate in index ${\geq}{\alpha}$), falling (change rate ${\leq}-{\alpha}$) and neutral ($-{\alpha}$ change rate < ${\alpha}$). If the forecast is rising, buy the Leveraged Exchange Traded Fund (ETF); if it is falling, buy the inverse ETF. The hit ratio (HR) of PNN1 implemented in this paper is 0.720 and 0.616 in the learning and the evaluation respectively. ETF trading yields a yield of 8.386 to 16.324 %. The proposed models show the better ETF trading success rate and yield than the neural network models predicting KOSPI.

Construction of a Bark Dataset for Automatic Tree Identification and Developing a Convolutional Neural Network-based Tree Species Identification Model (수목 동정을 위한 수피 분류 데이터셋 구축과 합성곱 신경망 기반 53개 수종의 동정 모델 개발)

  • Kim, Tae Kyung;Baek, Gyu Heon;Kim, Hyun Seok
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.2
    • /
    • pp.155-164
    • /
    • 2021
  • Many studies have been conducted on developing automatic plant identification algorithms using machine learning to various plant features, such as leaves and flowers. Unlike other plant characteristics, barks show only little change regardless of the season and are maintained for a long period. Nevertheless, barks show a complex shape with a large variation depending on the environment, and there are insufficient materials that can be utilized to train algorithms. Here, in addition to the previously published bark image dataset, BarkNet v.1.0, images of barks were collected, and a dataset consisting of 53 tree species that can be easily observed in Korea was presented. A convolutional neural network (CNN) was trained and tested on the dataset, and the factors that interfere with the model's performance were identified. For CNN architecture, VGG-16 and 19 were utilized. As a result, VGG-16 achieved 90.41% and VGG-19 achieved 92.62% accuracy. When tested on new tree images that do not exist in the original dataset but belong to the same genus or family, it was confirmed that more than 80% of cases were successfully identified as the same genus or family. Meanwhile, it was found that the model tended to misclassify when there were distracting features in the image, including leaves, mosses, and knots. In these cases, we propose that random cropping and classification by majority votes are valid for improving possible errors in training and inferences.

Self-archiving Motivations across Academic Disciplines on an Academic Social Networking Service (학술 소셜 네트워킹 서비스에서의 학문 분야별 연구자의 셀프 아카이빙 동기 분석)

  • Lee, Jongwook;Oh, Sanghee;Dong, Hang
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.4
    • /
    • pp.313-332
    • /
    • 2020
  • The purpose of this study is to compare motivations for self-archiving across disciplines on an academic social networking site. We carried out an online survey with ResearchGate(RG) users, testing 18 motivational factors that we developed from a previous study (enjoyment, personal/professional gain, reputation, learning, self-efficacy, altruism, reciprocity, trust, community interest, social engagement, publicity, accessibility, self-archiving culture, influence of external actors, credibility, system stability, copyright concerns, additional time, and effort). We adapted Biglan's classification system of academic disciplines and compared motivations across different categories of discipline. First, we compared motivations across the four combined categories by the two dimensions - hard-pure, hard-applied, soft-pure, and soft-applied. We also performed a motivation comparison across each dimension between soft and hard disciplines and between pure and applied disciplines. We examined investigated statistical differences in motivations by demographic characteristics and RG usage of participants across categories as well. Findings showed that there were differences of motivations, such as enjoyment, accessibility, influence of external actors and additional time and effort, and personal/professional gains, for self-archiving across disciplines. For example, RG users in the hard-applied were more highly motivated by enjoyment than others; RG users in the soft-pure were more highly motivated by personal/professional gains than others. It is expected that findings could be used to develop strategies encouraging researchers in various disciplines contributing to share their data and publications in ASNSs.

Abnormal Crowd Behavior Detection via H.264 Compression and SVDD in Video Surveillance System (H.264 압축과 SVDD를 이용한 영상 감시 시스템에서의 비정상 집단행동 탐지)

  • Oh, Seung-Geun;Lee, Jong-Uk;Chung, Yongw-Ha;Park, Dai-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.6
    • /
    • pp.183-190
    • /
    • 2011
  • In this paper, we propose a prototype system for abnormal sound detection and identification which detects and recognizes the abnormal situations by means of analyzing audio information coming in real time from CCTV cameras under surveillance environment. The proposed system is composed of two layers: The first layer is an one-class support vector machine, i.e., support vector data description (SVDD) that performs rapid detection of abnormal situations and alerts to the manager. The second layer classifies the detected abnormal sound into predefined class such as 'gun', 'scream', 'siren', 'crash', 'bomb' via a sparse representation classifier (SRC) to cope with emergency situations. The proposed system is designed in a hierarchical manner via a mixture of SVDD and SRC, which has desired characteristics as follows: 1) By fast detecting abnormal sound using SVDD trained with only normal sound, it does not perform the unnecessary classification for normal sound. 2) It ensures a reliable system performance via a SRC that has been successfully applied in the field of face recognition. 3) With the intrinsic incremental learning capability of SRC, it can actively adapt itself to the change of a sound database. The experimental results with the qualitative analysis illustrate the efficiency of the proposed method.

Survival network based Android Authorship Attribution considering overlapping tolerance (중복 허용 범위를 고려한 서바이벌 네트워크 기반 안드로이드 저자 식별)

  • Hwang, Cheol-hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.13-21
    • /
    • 2020
  • The Android author identification study can be interpreted as a method for revealing the source in a narrow range, but if viewed in a wide range, it can be interpreted as a study to gain insight to identify similar works through known works. The problem found in the Android author identification study is that it is an important code on the Android system, but it is difficult to find the important feature of the author due to the meaningless codes. Due to this, legitimate codes or behaviors were also incorrectly defined as malicious codes. To solve this, we introduced the concept of survival network to solve the problem by removing the features found in various Android apps and surviving unique features defined by authors. We conducted an experiment comparing the proposed framework with a previous study. From the results of experiments on 440 authors' identified apps, we obtained a classification accuracy of up to 92.10%, and showed a difference of up to 3.47% from the previous study. It used a small amount of learning data, but because it used unique features without duplicate features for each author, it was considered that there was a difference from previous studies. In addition, even in comparative experiments with previous studies according to the feature definition method, the same accuracy can be shown with a small number of features, and this can be seen that continuously overlapping meaningless features can be managed through the concept of a survival network.

Domain Knowledge Incorporated Counterfactual Example-Based Explanation for Bankruptcy Prediction Model (부도예측모형에서 도메인 지식을 통합한 반사실적 예시 기반 설명력 증진 방법)

  • Cho, Soo Hyun;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.307-332
    • /
    • 2022
  • One of the most intensively conducted research areas in business application study is a bankruptcy prediction model, a representative classification problem related to loan lending, investment decision making, and profitability to financial institutions. Many research demonstrated outstanding performance for bankruptcy prediction models using artificial intelligence techniques. However, since most machine learning algorithms are "black-box," AI has been identified as a prominent research topic for providing users with an explanation. Although there are many different approaches for explanations, this study focuses on explaining a bankruptcy prediction model using a counterfactual example. Users can obtain desired output from the model by using a counterfactual-based explanation, which provides an alternative case. This study introduces a counterfactual generation technique based on a genetic algorithm (GA) that leverages both domain knowledge (i.e., causal feasibility) and feature importance from a black-box model along with other critical counterfactual variables, including proximity, distribution, and sparsity. The proposed method was evaluated quantitatively and qualitatively to measure the quality and the validity.