• Title/Summary/Keyword: Network Feature Selection

Search Result 236, Processing Time 0.024 seconds

Nonlinear Time Series Prediction Modeling by Weighted Average Defuzzification Based on NEWFM (NEWFM 기반 가중평균 역퍼지화에 의한 비선형 시계열 예측 모델링)

  • Chai, Soo-Han;Lim, Joon-Shik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.4
    • /
    • pp.563-568
    • /
    • 2007
  • This paper presents a methodology for predicting nonlinear time series based on the neural network with weighted fuzzy membership functions (NEWFM). The degree of classification intensity is obtained by bounded sum of weighted fuzzy membership functions extracted by NEWFM, then weighted average defuzzification is used for predicting nonlinear time series. The experimental results demonstrate that NEWFM has the classification capability of 92.22% against the target class of GDP. The time series created by NEWFM model has a relatively close approximation to the GDP which is a typical business cycle indicator, and has been proved to be a useful indicator which has the turning point forecasting capability of average 12 months in the peak point and average 6 months in the trough point during 5th to 8th cyclical period. In addition, NEWFM measures the efficiency of the economic indexes by the feature selection and enables the users to forecast with reduced numbers of 7 among 10 leading indexes while improving the classification rate from 90% to 92.22%.

Mortality Prediction of Older Adults Using Random Forest and Deep Learning (랜덤 포레스트와 딥러닝을 이용한 노인환자의 사망률 예측)

  • Park, Junhyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.10
    • /
    • pp.309-316
    • /
    • 2020
  • We predict the mortality of the elderly patients visiting the emergency department who are over 65 years old using Feed Forward Neural Network (FFNN) and Convolutional Neural Network (CNN) respectively. Medical data consist of 99 features including basic information such as sex, age, temperature, and heart rate as well as past history, various blood tests and culture tests, and etc. Among these, we used random forest to select features by measuring the importance of features in the prediction of mortality. As a result, using the top 80 features with high importance is best in the mortality prediction. The performance of the FFNN and CNN is compared by using the selected features for training each neural network. To train CNN with images, we convert medical data to fixed size images. We acquire better results with CNN than with FFNN. With CNN for mortality prediction, F1 score and the AUC for test data are 56.9 and 92.1 respectively.

Enhancing the Quality of Service by GBSO Splay Tree Routing Framework in Wireless Sensor Network

  • Majidha Fathima K. M.;M. Suganthi;N. Santhiyakumari
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2188-2208
    • /
    • 2023
  • Quality of Service (QoS) is a critical feature of Wireless Sensor Networks (WSNs) with routing algorithms. Data packets are moved between cluster heads with QoS using a number of energy-efficient routing techniques. However, sustaining high scalability while increasing the life of a WSN's networks scenario remains a challenging task. Thus, this research aims to develop an energy-balancing component that ensures equal energy consumption for all network sensors while offering flexible routing without congestion, even at peak hours. This research work proposes a Gravitational Blackhole Search Optimised splay tree routing framework. Based on the splay tree topology, the routing procedure is carried out by the suggested method using three distinct steps. Initially, the proposed GBSO decides the optimal route at initiation phases by choosing the root node with optimum energy in the splay tree. In the selection stage, the steps for energy update and trust update are completed by evaluating a novel reliance function utilising the Parent Reliance (PR) and Grand Parent Reliance (GPR). Finally, in the routing phase, using the fitness measure and the minimal distance, the GBSO algorithm determines the best route for data broadcast. The model results demonstrated the efficacy of the suggested technique with 99.52% packet delivery ratio, a minimum delay of 0.19 s, and a network lifetime of 1750 rounds with 200 nodes. Also, the comparative analysis ensured that the suggested algorithm surpasses the effectiveness of the existing algorithm in all aspects and guaranteed end-to-end delivery of packets.

An Efficient Data Collection Method for Deep Learning-based Wireless Signal Identification in Unlicensed Spectrum (딥 러닝 기반의 이기종 무선 신호 구분을 위한 데이터 수집 효율화 기법)

  • Choi, Jaehyuk
    • Journal of IKEEE
    • /
    • v.26 no.1
    • /
    • pp.62-66
    • /
    • 2022
  • Recently, there have been many research efforts based on data-based deep learning technologies to deal with the interference problem between heterogeneous wireless communication devices in unlicensed frequency bands. However, existing approaches are commonly based on the use of complex neural network models, which require high computational power, limiting their efficiency in resource-constrained network interfaces and Internet of Things (IoT) devices. In this study, we address the problem of classifying heterogeneous wireless technologies including Wi-Fi and ZigBee in unlicensed spectrum bands. We focus on a data-driven approach that employs a supervised-learning method that uses received signal strength indicator (RSSI) data to train Deep Convolutional Neural Networks (CNNs). We propose a simple measurement methodology for collecting RSSI training data which preserves temporal and spectral properties of the target signal. Real experimental results using an open-source 2.4 GHz wireless development platform Ubertooth show that the proposed sampling method maintains the same accuracy with only a 10% level of sampling data for the same neural network architecture.

Research on damage detection and assessment of civil engineering structures based on DeepLabV3+ deep learning model

  • Chengyan Song
    • Structural Engineering and Mechanics
    • /
    • v.91 no.5
    • /
    • pp.443-457
    • /
    • 2024
  • At present, the traditional concrete surface inspection methods based on artificial vision have the problems of high cost and insecurity, while the computer vision methods rely on artificial selection features in the case of sensitive environmental changes and difficult promotion. In order to solve these problems, this paper introduces deep learning technology in the field of computer vision to achieve automatic feature extraction of structural damage, with excellent detection speed and strong generalization ability. The main contents of this study are as follows: (1) A method based on DeepLabV3+ convolutional neural network model is proposed for surface detection of post-earthquake structural damage, including surface damage such as concrete cracks, spaling and exposed steel bars. The key semantic information is extracted by different backbone networks, and the data sets containing various surface damage are trained, tested and evaluated. The intersection ratios of 54.4%, 44.2%, and 89.9% in the test set demonstrate the network's capability to accurately identify different types of structural surface damages in pixel-level segmentation, highlighting its effectiveness in varied testing scenarios. (2) A semantic segmentation model based on DeepLabV3+ convolutional neural network is proposed for the detection and evaluation of post-earthquake structural components. Using a dataset that includes building structural components and their damage degrees for training, testing, and evaluation, semantic segmentation detection accuracies were recorded at 98.5% and 56.9%. To provide a comprehensive assessment that considers both false positives and false negatives, the Mean Intersection over Union (Mean IoU) was employed as the primary evaluation metric. This choice ensures that the network's performance in detecting and evaluating pixel-level damage in post-earthquake structural components is evaluated uniformly across all experiments. By incorporating deep learning technology, this study not only offers an innovative solution for accurately identifying post-earthquake damage in civil engineering structures but also contributes significantly to empirical research in automated detection and evaluation within the field of structural health monitoring.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Comparison of Detection Performance of Intrusion Detection System Using Fuzzy and Artificial Neural Network (퍼지와 인공 신경망을 이용한 침입탐지시스템의 탐지 성능 비교 연구)

  • Yang, Eun-Mok;Lee, Hak-Jae;Seo, Chang-Ho
    • Journal of Digital Convergence
    • /
    • v.15 no.6
    • /
    • pp.391-398
    • /
    • 2017
  • In this paper, we compared the performance of "Network Intrusion Detection System based on attack feature selection using fuzzy control language"[1] and "Intelligent Intrusion Detection System Model for attack classification using RNN"[2]. In this paper, we compare the intrusion detection performance of two techniques using KDD CUP 99 dataset. The KDD 99 dataset contains data sets for training and test data sets that can detect existing intrusions through training. There are also data that can test whether training data and the types of intrusions that are not present in the test data can be detected. We compared two papers showing good intrusion detection performance in training and test data. In the comparative paper, there is a lack of performance to detect intrusions that exist but have no existing intrusion detection capability. Among the attack types, DoS, Probe, and R2L have high detection rate using fuzzy and U2L has a high detection rate using RNN.

Development of the KOSPI (Korea Composite Stock Price Index) forecast model using neural network and statistical methods) (신경 회로망과 통계적 기법을 이용한 종합주가지수 예측 모형의 개발)

  • Lee, Eun-Jin;Min, Chul-Hong;Kim, Tae-Seon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.95-101
    • /
    • 2008
  • Modeling of stock prices forecast has been considered as one of the most difficult problem to develop accurately since stock prices are highly correlated with various environmental conditions including economics and political situation. In this paper, we propose a agent system approach to predict Korea Composite Stock Price Index (KOSPI) using neural network and statistical methods. To minimize mean of prediction error and variation of prediction error, agent system includes sub-agent modules for feature extraction, variables selection, forecast engine selection, and forecasting results analysis. As a first step to develop agent system for KOSPI forecasting, twelve economic indices are selected from twenty two basic standard economic indices using principal component analysis. From selected twelve economic indices, prediction model input variables are chosen again using best-subsets regression method. Two different types data are tested for KOSPI forecasting and the Prediction results showed 11.92 points of root mean squared error for consecutive thirty days of prediction. Also, it is shown that proposed agent system approach for KOSPI forecast is effective since required types and numbers of prediction variables are time-varying, so adaptable selection of modeling inputs and prediction engine are essential for reliable and accurate forecast model.

Library Services in Information Society (정보사회의 도서관봉사)

  • Chun Myung-Sook
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.27
    • /
    • pp.161-181
    • /
    • 1994
  • As information technologies are applied to the libraries in information society, the library services have been changing its feature. Therefore, the purpose of this paper is to explore and to establish a paradigm of the library services in information society. It is hypothesized that the application of information technologies leads to the change of the library services in information society. To prove the hypothesis, the data were collected from various research results carried out in the developed countries and by observing various libraries where the information technologies are extensively applied. The findings are as follows: 1. As information technologies are applied to the library, many new library services emerge for the society. 2. As the electronic data replace the paper data, the collection of a library becomes the collection of the libraries in the world. Therefore, the accessibility to the information network is more important than to own information in the library. Librarians select various electronic data according to the library policy which distinguishes their own library collection from others. The policy also solves the various problems related to weeding and preserving library collection. And the use of CD-ROM selection tools enable library users to select their own data. Now, the censorship becomes the concems of the library users, not the library. 3. The catalogs are reorganized for the electronic data for the international use. The most important information in the catalog is the location of the data and the multi access to the data are necessary. 4. As the information technologies are applied in book selection, cataloguing, information retreival and circulation, the library users are enable to service themselves in the library. And most of the routine works related to the information service are taken over by the library staffs. Professional librarians engage in user education, information marketing and fund raising. 5. Public libraries in information society serves those who have no access to the information. They help the illiterates. patients in the hospital, prisoners, and homeless in the city. Therefore, the information technologies enhance the role of librarians in professional work in the library as well as in the information society.

  • PDF