• Title/Summary/Keyword: variable feature

Search Result 388, Processing Time 0.029 seconds

Analysis of Hypertension Risk Factors by Life Cycle Based on Machine Learning (머신러닝 기반 생애주기별 고혈압 위험 요인 분석)

  • Kang, SeongAn;Kim, SoHui;Ryu, Min Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.5
    • /
    • pp.73-82
    • /
    • 2022
  • Chronic diseases such as hypertension require a differentiated approach according to age and life cycle. Chronic diseases such as hypertension require differentiated management according to the life cycle. It is also known that the cause of hypertension is a combination of various factors. This study uses machine learning prediction techniques to analyze various factors affecting hypertension by life cycle. To this end, a total of 35 variables were used through preprocessing and variable selection processes for the National Health and Nutrition Survey data of the Korea Centers for Disease Control and Prevention. As a result of the study, among the tree-based machine learning models, XGBoost was found to have high predictive performance in both middle and old age. Looking at the risk factors for hypertension by life cycle, individual characteristic factors, genetic factors, and nutritional intake factors were found to be risk factors for hypertension in the middle age, and nutritional intake factors, dietary factors, and lifestyle factors were derived as risk factors for hypertension. The results of this study are expected to be used as basic data useful for hypertension management by life cycle.

Investigation of AI-based dual-model strategy for monitoring cyanobacterial blooms from Sentinel-3 in Korean inland waters

  • Hoang Hai Nguyen;Dalgeun Lee;Sunghwa Choi;Daeyun Shin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.168-168
    • /
    • 2023
  • The frequent occurrence of cyanobacterial harmful algal blooms (CHABs) in inland waters under climate change seriously damages the ecosystem and human health and is becoming a big problem in South Korea. Satellite remote sensing is suggested for effective monitoring CHABs at a larger scale of water bodies since the traditional method based on sparse in-situ networks is limited in space. However, utilizing a standalone variable of satellite reflectances in common CHABs dual-models, which relies on both chlorophyll-a (Chl-a) and phycocyanin or cyanobacteria cells (Cyano-cell), is not fully beneficial because their seasonal variation is highly impacted by surrounding meteorological and bio-environmental factors. Along with the development of Artificial Intelligence (AI), monitoring CHABs from space with analyzing the effects of environmental factors is accessible. This study aimed to investigate the potential application of AI in the dual-model strategy (Chl-a and Cyano-cell are output parameters) for monitoring seasonal dynamics of CHABs from satellites over Korean inland waters. The Sentinel-3 satellite was selected in this study due to the variety of spectral bands and its unique band (620 nm), which is sensitive to cyanobacteria. Via the AI-based feature selection, we analyzed the relationships between two output parameters and major parameters (satellite water-leaving reflectances at different spectral bands), together with auxiliary (meteorological and bio-environmental) parameters, to select the most important ones. Several AI models were then employed for modelling Chl-a and Cyano-cell concentration from those selected important parameters. Performance evaluation of the AI models and their comparison to traditional semi-analytical models were conducted to demonstrate whether AI models (using water-leaving reflectances and environmental variables) outperform traditional models (using water-leaving reflectances only) and which AI models are superior for monitoring CHABs from Sentinel-3 satellite over a Korean inland water body.

  • PDF

Environmental variable selection and synthetic sampling methods for improving the accuracy of algal alert level prediction model (변수 선택 및 샘플링 기법을 적용한 조류 경보 단계 예측 모델의 정확도 개선)

  • Jin Hwi Kim;Hankyu Lee;Seohyun Byeon;Jae-Ki Shin;Yongeun Park
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.517-517
    • /
    • 2023
  • 현재 우리나라에서는 4대강 및 주요 호소 29지점을 대상으로 조류경보제가 시행되고 있으며 조류 경보 단계는 실시간 모니터링지점에서 측정되는 유해 조류의 셀농도를 기반으로 발령 단계가 결정된다. 상수원 구간은 관심, 경계, 조류 대발생, 해제 또는 미발생 총 4구간으로 구성되며, 친수 활동 구간의 경우 조류 대발생을 제외한 3구간으로 구성된다. 현재 시행되는 조류 경보제의 목적은 유해 조류 발생 시 사후 대응 방안 마련에 보다 초점이 맞춰져 있으며 특히, 모니터링 주기 확대 여부, 오염원 관리 방안 마련, 조류 제거 여부 등의 의사 결정 수단으로 사용되고 있다. 하지만 조류 경보 단계에 대한 사전 예측이 가능한 경우 유해 조류의 성장을 억제할 수 있으며 이를 통해 안전하고 깨끗한 수자원을 확보할 수 있다. 본 연구에서는 조류 경보 단계의 사전적 예측을 위해 국가 실시간 측정망에서 제공하는 전국 보 모니터링 종합 정보 자료, 기상측정망 자료, 실시간 보 현황 자료를 활용하여 예측 모델을 구축하였다. 또한, 단계 예측의 정확도를 개선하기 위해 변수 선택 기법을 활용하여 조류 경보 단계에 영향을 미치는 환경변수를 선정하였으며 자료의 불균형으로 인해 모델 학습 과정에서 발생하는 예측 오류를 최소화하기 위해 다양한 샘플링 기법을 적용하여 모델의 성능을 평가하였다. 변수 선택 및 샘플링 기법을 고려하지 않은 원자료를 사용하여 예측 모델을 구축한 결과 관심 단계(Level-1) 및 경보 단계(Level-2)에 대해 각각 50%, 62.5%의 예측 정확도를 보인 반면 비선형 변수 선택 기법 및 Synthetic Minority Over-sampling Technique-Edited Nearrest Neighbor(SMOTE-ENN) 샘플링 기법을 적용하여 구축한 모델에서는 Level-1은 85.7%, Level-2는 75.0%의 예측 정확도를 보였다.

  • PDF

Short-Term Water Quality Prediction of the Paldang Reservoir Using Recurrent Neural Network Models (순환신경망 모델을 활용한 팔당호의 단기 수질 예측)

  • Jiwoo Han;Yong-Chul Cho;Soyoung Lee;Sanghun Kim;Taegu Kang
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.1
    • /
    • pp.46-60
    • /
    • 2023
  • Climate change causes fluctuations in water quality in the aquatic environment, which can cause changes in water circulation patterns and severe adverse effects on aquatic ecosystems in the future. Therefore, research is needed to predict and respond to water quality changes caused by climate change in advance. In this study, we tried to predict the dissolved oxygen (DO), chlorophyll-a, and turbidity of the Paldang reservoir for about two weeks using long short-term memory (LSTM) and gated recurrent units (GRU), which are deep learning algorithms based on recurrent neural networks. The model was built based on real-time water quality data and meteorological data. The observation period was set from July to September in the summer of 2021 (Period 1) and from March to May in the spring of 2022 (Period 2). We tried to select an algorithm with optimal predictive power for each water quality parameter. In addition, to improve the predictive power of the model, an important variable extraction technique using random forest was used to select only the important variables as input variables. In both Periods 1 and 2, the predictive power after extracting important variables was further improved. Except for DO in Period 2, GRU was selected as the best model in all water quality parameters. This methodology can be useful for preventive water quality management by identifying the variability of water quality in advance and predicting water quality in a short period.

Classification of Fall Crops Using Unmanned Aerial Vehicle Based Image and Support Vector Machine Model - Focusing on Idam-ri, Goesan-gun, Chungcheongbuk-do - (무인기 기반 영상과 SVM 모델을 이용한 가을수확 작물 분류 - 충북 괴산군 이담리 지역을 중심으로 -)

  • Jeong, Chan-Hee;Go, Seung-Hwan;Park, Jong-Hwa
    • Journal of Korean Society of Rural Planning
    • /
    • v.28 no.1
    • /
    • pp.57-69
    • /
    • 2022
  • Crop classification is very important for estimating crop yield and figuring out accurate cultivation area. The purpose of this study is to classify crops harvested in fall in Idam-ri, Goesan-gun, Chungcheongbuk-do by using unmanned aerial vehicle (UAV) images and support vector machine (SVM) model. The study proceeded in the order of image acquisition, variable extraction, model building, and evaluation. First, RGB and multispectral image were acquired on September 13, 2021. Independent variables which were applied to Farm-Map, consisted gray level co-occurrence matrix (GLCM)-based texture characteristics by using RGB images, and multispectral reflectance data. The crop classification model was built using texture characteristics and reflectance data, and finally, accuracy evaluation was performed using the error matrix. As a result of the study, the classification model consisted of four types to compare the classification accuracy according to the combination of independent variables. The result of four types of model analysis, recursive feature elimination (RFE) model showed the highest accuracy with an overall accuracy (OA) of 88.64%, Kappa coefficient of 0.84. UAV-based RGB and multispectral images effectively classified cabbage, rice and soybean when the SVM model was applied. The results of this study provided capacity usefully in classifying crops using single-period images. These technologies are expected to improve the accuracy and efficiency of crop cultivation area surveys by supplementing additional data learning, and to provide basic data for estimating crop yields.

Research on Selecting Influential Climatic Factors and Optimal Timing Exploration for a Rice Production Forecast Model Using Weather Data

  • Jin-Kyeong Seo;Da-Jeong Choi;Juryon Paik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.57-65
    • /
    • 2023
  • Various studies to enhance the accuracy of rice production forecasting are focused on improving the accuracy of the models. In contrast, there is a relative lack of research regarding the data itself, which the prediction models are applied to. When applying the same dependent variable and prediction model to two different sets of rice production data composed of distinct features, discrepancies in results can occur. It is challenging to determine which dataset yields superior results under such circumstances. To address this issue, by identifying potential influential features within the data before applying the prediction model and centering the modeling around these, it is possible to achieve stable prediction results regardless of the composition of the data. In this study, we propose a method to adjust the composition of the data's features in order to select optimal base variables, aiding in achieving stable and consistent predictions for rice production. This method makes use of the Korea Meteorological Administration's ASOS data. The findings of this study are expected to make a substantial contribution towards enhancing the utility of performance evaluations in future research endeavors.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

The Effect of Attributes of Innovation and Perceived Risk on Product Attitudes and Intention to Adopt Smart Wear (스마트 의류의 혁신속성과 지각된 위험이 제품 태도 및 수용의도에 미치는 영향)

  • Ko, Eun-Ju;Sung, Hee-Won;Yoon, Hye-Rim
    • Journal of Global Scholars of Marketing Science
    • /
    • v.18 no.2
    • /
    • pp.89-111
    • /
    • 2008
  • Due to the development of digital technology, studies regarding smart wear integrating daily life have rapidly increased. However, consumer research about perception and attitude toward smart clothing hardly could find. The purpose of this study was to identify innovative characteristics and perceived risk of smart clothing and to analyze the influences of theses factors on product attitudes and intention to adopt. Specifically, five hypotheses were established. H1: Perceived attributes of smart clothing except for complexity would have positive relations to product attitude or purchase intention, while complexity would be opposite. H2: Product attitude would have positive relation to purchase intention. H3: Product attitude would have a mediating effect between perceived attributes and purchase intention. H4: Perceived risks of smart clothing would have negative relations to perceived attributes except for complexity, and positive relations to complexity. H5: Product attitude would have a mediating effect between perceived risks and purchase intention. A self-administered questionnaire was developed based on previous studies. After pretest, the data were collected during September, 2006, from university students in Korea who were relatively sensitive to innovative products. A total of 300 final useful questionnaire were analyzed by SPSS 13.0 program. About 60.3% were male with the mean age of 21.3 years old. About 59.3% reported that they were aware of smart clothing, but only 9 respondents purchased it. The mean of attitudes toward smart clothing and purchase intention was 2.96 (SD=.56) and 2.63 (SD=.65) respectively. Factor analysis using principal components with varimax rotation was conducted to identify perceived attribute and perceived risk dimensions. Perceived attributes of smart wear were categorized into relative advantage (including compatibility), observability (including triability), and complexity. Perceived risks were identified into physical/performance risk, social psychological risk, time loss risk, and economic risk. Regression analysis was conducted to test five hypotheses. Relative advantage and observability were significant predictors of product attitude (adj $R^2$=.223) and purchase intention (adj $R^2$=.221). Complexity showed negative influence on product attitude. Product attitude presented significant relation to purchase intention (adj $R^2$=.692) and partial mediating effect between perceived attributes and purchase intention (adj $R^2$=.698). Therefore hypothesis one to three were accepted. In order to test hypothesis four, four dimensions of perceived risk and demographic variables (age, gender, monthly household income, awareness of smart clothing, and purchase experience) were entered as independent variables in the regression models. Social psychological risk, economic risk, and gender (female) were significant to predict relative advantage (adj $R^2$=.276). When perceived observability was a dependent variable, social psychological risk, time loss risk, physical/performance risk, and age (younger) were significant in order (adj $R^2$=.144). However, physical/performance risk was positively related to observability. The more Koreans seemed to be observable of smart clothing, the more increased the probability of physical harm or performance problems received. Complexity was predicted by product awareness, social psychological risk, economic risk, and purchase experience in order (adj $R^2$=.114). Product awareness was negatively related to complexity, meaning high level of product awareness would reduce complexity of smart clothing. However, purchase experience presented positive relation with complexity. It appears that consumers can perceive high level of complexity when they are actually consuming smart clothing in real life. Risk variables were positively related with complexity. That is, in order to decrease complexity, it is also necessary to consider minimizing anxiety factors about social psychological wound or loss of money. Thus, hypothesis 4 was partially accepted. Finally, in testing hypothesis 5, social psychological risk and economic risk were significant predictors for product attitude (adj $R^2$=.122) and purchase intention (adj $R^2$=.099) respectively. When attitude variable was included with risk variables as independent variables in the regression model to predict purchase intention, only attitude variable was significant (adj $R^2$=.691). Thus attitude variable presented full mediating effect between perceived risks and purchase intention, and hypothesis 5 was accepted. Findings would provide guidelines for fashion and electronic businesses who aim to create and strengthen positive attitude toward smart clothing. Marketers need to consider not only functional feature of smart clothing, but also practical and aesthetic attributes, since appropriateness for social norm or self image would reduce uncertainty of psychological or social risk, which increase relative advantage of smart clothing. Actually social psychological risk was significantly associated to relative advantage. Economic risk is negatively associated with product attitudes as well as purchase intention, suggesting that smart-wear developers have to reflect on price ranges of potential adopters. It will be effective to utilize the findings associated with complexity when marketers in US plan communication strategy.

  • PDF

Fast Mode Decision using Block Size Activity for H.264/AVC (블록 크기 활동도를 이용한 H.264/AVC 부호화 고속 모드 결정)

  • Jung, Bong-Soo;Jeon, Byeung-Woo;Choi, Kwang-Pyo;Oh, Yun-Je
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.2 s.314
    • /
    • pp.1-11
    • /
    • 2007
  • H.264/AVC uses variable block sizes to achieve significant coding gain. It has 7 different coding modes having different motion compensation block sizes in Inter slice, and 2 different intra prediction modes in Intra slice. This fine-tuned new coding feature has achieved far more significant coding gain compared with previous video coding standards. However, extremely high computational complexity is required when rate-distortion optimization (RDO) algorithm is used. This computational complexity is a major problem in implementing real-time H.264/AVC encoder on computationally constrained devices. Therefore, there is a clear need for complexity reduction algorithm of H.264/AVC such as fast mode decision. In this paper, we propose a fast mode decision with early $P8\times8$ mode rejection based on block size activity using large block history map (LBHM). Simulation results show that without any meaningful degradation, the proposed method reduces whole encoding time on average by 53%. Also the hybrid usage of the proposed method and the early SKIP mode decision in H.264/AVC reference model reduces whole encoding time by 63% on average.