• Title/Summary/Keyword: feature selection

Search Result 1,058, Processing Time 0.031 seconds

Assessment of Landslide Susceptibility in Jecheon Using Deep Learning Based on Exploratory Data Analysis (데이터 탐색을 활용한 딥러닝 기반 제천 지역 산사태 취약성 분석)

  • Sang-A Ahn;Jung-Hyun Lee;Hyuck-Jin Park
    • The Journal of Engineering Geology
    • /
    • v.33 no.4
    • /
    • pp.673-687
    • /
    • 2023
  • Exploratory data analysis is the process of observing and understanding data collected from various sources to identify their distributions and correlations through their structures and characterization. This process can be used to identify correlations among conditioning factors and select the most effective factors for analysis. This can help the assessment of landslide susceptibility, because landslides are usually triggered by multiple factors, and the impacts of these factors vary by region. This study compared two stages of exploratory data analysis to examine the impact of the data exploration procedure on the landslide prediction model's performance with respect to factor selection. Deep-learning-based landslide susceptibility analysis used either a combinations of selected factors or all 23 factors. During the data exploration phase, we used a Pearson correlation coefficient heat map and a histogram of random forest feature importance. We then assessed the accuracy of our deep-learning-based analysis of landslide susceptibility using a confusion matrix. Finally, a landslide susceptibility map was generated using the landslide susceptibility index derived from the proposed analysis. The analysis revealed that using all 23 factors resulted in low accuracy (55.90%), but using the 13 factors selected in one step of exploration improved the accuracy to 81.25%. This was further improved to 92.80% using only the nine conditioning factors selected during both steps of the data exploration. Therefore, exploratory data analysis selected the conditioning factors most suitable for landslide susceptibility analysis and thereby improving the performance of the analysis.

Efficient Data Representation of Stereo Images Using Edge-based Mesh Optimization (윤곽선 기반 메쉬 최적화를 이용한 효율적인 스테레오 영상 데이터 표현)

  • Park, Il-Kwon;Byun, Hye-Ran
    • Journal of Broadcast Engineering
    • /
    • v.14 no.3
    • /
    • pp.322-331
    • /
    • 2009
  • This paper proposes an efficient data representation of stereo images using edge-based mesh optimization. Mash-based two dimensional warping for stereo images mainly depends on the performance of a node selection and a disparity estimation of selected nodes. Therefore, the proposed method first of all constructs the feature map which consists of both strong edges and boundary lines of objects for node selection and then generates a grid-based mesh structure using initial nodes. The displacement of each nodal position is iteratively estimated by minimizing the predicted errors between target image and predicted image after two dimensional warping for local area. Generally, iterative two dimensional warping for optimized nodal position required a high time complexity. To overcome this problem, we assume that input stereo images are only horizontal disparity and that optimal nodal position is located on the edge include object boundary lines. Therefore, proposed iterative warping method performs searching process to find optimal nodal position only on edge lines along the horizontal lines. In the experiments, we compare our proposed method with the other mesh-based methods with respect to the quality by using Peak Signal to Noise Ratio (PSNR) according to the number of nodes. Furthermore, computational complexity for an optimal mesh generation is also estimated. Therefore, we have the results that our proposed method provides an efficient stereo image representation not only fast optimal mesh generation but also decreasing of quality deterioration in spite of a small number of nodes through our experiments.

Construction of a artificial levee line in river zones using LiDAR Data (라이다 자료를 이용한 하천지역 인공 제방선 추출)

  • Choung, Yun-Jae;Park, Hyeon-Cheol;Jo, Myung-Hee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2011.05a
    • /
    • pp.185-185
    • /
    • 2011
  • Mapping of artificial levee lines, one of major tasks in river zone mapping, is critical to prevention of river flood, protection of environments and eco systems in river zones. Thus, mapping of artificial levee lines is essential for management and development of river zones. Coastal mapping including river zone mapping has been historically carried out using surveying technologies. Photogrammetry, one of the surveying technologies, is recently used technology for national river zone mapping in Korea. Airborne laser scanning has been used in most advanced countries for coastal mapping due to its ability to penetrate shallow water and its high vertical accuracy. Due to these advantages, use of LiDAR data in coastal mapping is efficient for monitoring and predicting significant topographic change in river zones. This paper introduces a method for construction of a 3D artificial levee line using a set of LiDAR points that uses normal vectors. Multiple steps are involved in this method. First, a 2.5-dimensional Delaunay triangle mesh is generated based on three nearest-neighbor points in the LiDAR data. Second, a median filtering is applied to minimize noise. Third, edge selection algorithms are applied to extract break edges from a Delaunay triangle mesh using two normal vectors. In this research, two methods for edge selection algorithms using hypothesis testing are used to extract break edges. Fourth, intersection edges which are extracted using both methods at the same range are selected as the intersection edge group. Fifth, among intersection edge group, some linear feature edges which are not suitable to compose a levee line are removed as much as possible considering vertical distance, slope and connectivity of an edge. Sixth, with all line segments which are suitable to constitute a levee line, one river levee line segment is connected to another river levee line segment with the end points of both river levee line segments located nearest horizontally and vertically to each other. After linkage of all the river levee line segments, the initial river levee line is generated. Since the initial river levee line consists of the LiDAR points, the pattern of the initial river levee line is being zigzag along the river levee. Thus, for the last step, a algorithm for smoothing the initial river levee line is applied to fit the initial river levee line into the reference line, and the final 3D river levee line is constructed. After the algorithm is completed, the proposed algorithm is applied to construct the 3D river levee line in Zng-San levee nearby Ham-Ahn Bo in Nak-Dong river. Statistical results show that the constructed river levee line generated using a proposed method has high accuracy in comparison to the ground truth. This paper shows that use of LiDAR data for construction of the 3D river levee line for river zone mapping is useful and efficient; and, as a result, it can be replaced with ground surveying method for construction of the 3D river levee line.

  • PDF

The Design of Feature Selection Classifier based on Physiological Signal for Emotion Detection (감성판별을 위한 생체신호기반 특징선택 분류기 설계)

  • Lee, JeeEun;Yoo, Sun K.
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.206-216
    • /
    • 2013
  • The emotion plays a critical role in human's daily life including learning, action, decision and communication. In this paper, emotion discrimination classifier is designed to reduce system complexity through reduced selection of dominant features from biosignals. The photoplethysmography(PPG), skin temperature, skin conductance, fontal and parietal electroencephalography(EEG) signals were measured during 4 types of movie watching associated with the induction of neutral, sad, fear joy emotions. The genetic algorithm with support vector machine(SVM) based fitness function was designed to determine dominant features among 24 parameters extracted from measured biosignals. It shows maximum classification accuracy of 96.4%, which is 17% higher than that of SVM alone. The minimum error features selected are the mean and NN50 of heart rate variability from PPG signal, the mean of PPG induced pulse transit time, the mean of skin resistance, and ${\delta}$ and ${\beta}$ frequency band powers of parietal EEG. The combination of parietal EEG, PPG, and skin resistance is recommendable in high accuracy instrumentation, while the combinational use of PPG and skin conductance(79% accuracy) is affordable in simplified instrumentation.

Study on the Preferences of Horticulture According to MBTI Personality Type in College Students (대학생의 MBTI 성격유형에 따른 원예선호도에 관한 연구)

  • Jeong, Seon-Hee;Huh, Moo-Ryong
    • Journal of agriculture & life science
    • /
    • v.45 no.6
    • /
    • pp.65-72
    • /
    • 2011
  • ENFP the flowers 66.7%, INFP 50% of the leaves was the first that showed interest. ISTP in the fragrance of the plants, the 55.6% interest in the first feeling, ENTJ's 60%, INFJ's 75%, INTJ 83.3% of the entire plant in the form of the first was said to have interest. MBTI personality types, depending on the interest portion of the plant selection for mean difference (p=0.004). Favorite types of plants, flowers, 53.7%, 32.7% fruits, vegetables, 5.4% have been selected, depending on the features your favorite plants, types of hearing means the difference (p=0.022). Select any of four kinds of flowers, plants, plants that NF 40.7%, NT 58.8% were preferred, SF a fragrant plants that were preferred by 41.8%. ST flowering plants in the 37.5%, 29.2% of the seasonal green leafy foliage was preferred. Depending on the psychological features a selection for your favorite flora also indicates a difference (p=0.038). MBTI personality types based on the four indicators in the form of leaves and flowers preferred type for accident analysis determined a feature based on the emotional type, meaning that it leaves the choice of form, and (p=0.036), determined according to the type-aware select a flower type refers to the difference (p=0.025).

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

An Experimental Study on the a Light Device which Adopt Safety Ultra Constant Dischange Lamp (초정압 방전램프(UCD)를 적용한 안전 조명 장치에 관한 연구)

  • Jeong, Poong-Gi;Kim, Young-Chul
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2010.11a
    • /
    • pp.63-80
    • /
    • 2010
  • This paper describes the development of various lighting equipment adapting Ultra Constant Discharge Lamp that has newly been on commercial supply in the market. Meeting the required conditions of lighting equipment, various types of UCD Lamp equipment with excellent performances could be successfully developed. In order to provide a guideline for the economical lighting product selection, the analyzed data comparison between Hi-pressure Sodium Lamp which has been the most popular lamp for street lighting and UCD Lamp is provided. The conclusions of the study are made as follows; (1) The performance measurement result of UCD Lamp shows excellent Luminous Efficacy as 108Lm/W, daylight-like Color Rendering Index as 90Ra, and the best operating temperature range as $-50^{\circ}C{\sim}+85^{\circ}C$. Comparing to the Hi-pressure Sodium Lamp, UCD could be evaluated as much superior products. (2) In an assembled status with the lighting fixture (Type STB형-60W), UCD Lamp was tested OK for one hour duration at the temperature range form $-50^{\circ}C$ to $+85^{\circ}C$ and the humidity of 98%. The operation at the extremely low temperature can be an excellent feature to enable the export to the cold temperature regions such as Northern Europe and Russia and the specific applications for defense systems and special industry. (3) As UCD Lamp is a genuine Korea made product following Energy-saving and Eco-friendly policy, it should be appreciated as one of the best $CO^2$ reduction Green product.

  • PDF

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

  • Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.79-89
    • /
    • 2016
  • In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.

Design of Automatic Document Classifier for IT documents based on SVM (SVM을 이용한 디렉토리 기반 기술정보 문서 자동 분류시스템 설계)

  • Kang, Yun-Hee;Park, Young-B.
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.186-194
    • /
    • 2004
  • Due to the exponential growth of information on the internet, it is getting difficult to find and organize relevant informations. To reduce heavy overload of accesses to information, automatic text classification for handling enormous documents is necessary. In this paper, we describe structure and implementation of a document classification system for web documents. We utilize SVM for documentation classification model that is constructed based on training set and its representative terms in a directory. In our system, SVM is trained and is used for document classification by using word set that is extracted from information and communication related web documents. In addition, we use vector-space model in order to represent characteristics based on TFiDF and training data consists of positive and negative classes that are represented by using characteristic set with weight. Experiments show the results of categorization and the correlation of vector length.

  • PDF

Methodology for Prioritizing Sidewalk Construction among 100 Candidate Sites on Rural National Highways (지방부 국도에서의 보도설치 우선순위 결정을 위한 방법론 개발 (일반국도 적용사례 중심으로))

  • Jeon, Woo Hoon;Yang, Choong Heon;Yoon, Jung Eun;Yang, Inchul
    • International Journal of Highway Engineering
    • /
    • v.17 no.4
    • /
    • pp.127-133
    • /
    • 2015
  • PURPOSES: The purpose of this study is to develop a methodology to prioritize sidewalk construction on rural national highways. METHODS : In order to determine an appropriate prioritization for sidewalk construction, we developed a specific methodology. The proposed methodology includes three main steps: 1) Analytic Hierarchy Process (AHP) methods, 2) Subjective evaluation of relevant road agencies for the candidate sidewalks along rural national highways, and 3) Field study conduction. Each step has four phases. The primary feature of this methodology is the addition of expert consultation and survey data, as well as a field study. In addition, the method could guarantee flexibility in selection for evaluation criteria. As a result, the proposed methodology could be used as a general procedure for application to other roadway classifications when considering sidewalk construction. RESULTS: In order to demonstrate the reasonableness of the proposed methodology, a case study was performed for exactly 100 candidate sites for sidewalk construction on rural national highways. All required evaluation scores were properly produced for each candidate site. By doing so, decision-makers can determine the priority for sidewalk construction at these sites by reviewing quantitatively and qualitatively considered data. CONCLUSIONS: The results of the case study can be applied to a long-term fundamental plan for sidewalk construction on rural national highways. Furthermore, this methodology could be employed to prioritize a small-scale SOC project(e. g. bicycle or pedestrian roads).