• Title/Summary/Keyword: F-Measure

Search Result 1,400, Processing Time 0.028 seconds

Default Prediction for Real Estate Companies with Imbalanced Dataset

  • Dong, Yuan-Xiang;Xiao, Zhi;Xiao, Xue
    • Journal of Information Processing Systems
    • /
    • v.10 no.2
    • /
    • pp.314-333
    • /
    • 2014
  • When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

A Multi-Stage Approach to Secure Digital Image Search over Public Cloud using Speeded-Up Robust Features (SURF) Algorithm

  • AL-Omari, Ahmad H.;Otair, Mohammed A.;Alzwahreh, Bayan N.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12
    • /
    • pp.65-74
    • /
    • 2021
  • Digital image processing and retrieving have increasingly become very popular on the Internet and getting more attention from various multimedia fields. That results in additional privacy requirements placed on efficient image matching techniques in various applications. Hence, several searching methods have been developed when confidential images are used in image matching between pairs of security agencies, most of these search methods either limited by its cost or precision. This study proposes a secure and efficient method that preserves image privacy and confidentially between two communicating parties. To retrieve an image, feature vector is extracted from the given query image, and then the similarities with the stored database images features vector are calculated to retrieve the matched images based on an indexing scheme and matching strategy. We used a secure content-based image retrieval features detector algorithm called Speeded-Up Robust Features (SURF) algorithm over public cloud to extract the features and the Honey Encryption algorithm. The purpose of using the encrypted images database is to provide an accurate searching through encrypted documents without needing decryption. Progress in this area helps protect the privacy of sensitive data stored on the cloud. The experimental results (conducted on a well-known image-set) show that the performance of the proposed methodology achieved a noticeable enhancement level in terms of precision, recall, F-Measure, and execution time.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Optical Image Encryption Technique Based on Hybrid-pattern Phase Keys

  • Sun, Wenqing;Wang, Lei;Wang, Jun;Li, Hua;Wu, Quanying
    • Current Optics and Photonics
    • /
    • v.2 no.6
    • /
    • pp.540-546
    • /
    • 2018
  • We propose an implementation scheme for an optical encryption system with hybrid-pattern random keys. In the encryption process, a pair of random phase keys composed of a white-noise phase key and a structured phase key are positioned in the input plane and Fourier-spectrum plane respectively. The output image is recoverable by digital reconstruction, using the conjugate of the encryption key in the Fourier-spectrum plane. We discuss the system encryption performance when different combinations of phase-key pairs are used. To measure the effectiveness of the proposed method, we calculate the statistical indicators between original and encrypted images. The results are compared to those generated from a classical double random phase encoding. Computer simulations are presented to show the validity of the method.

PMCN: Combining PDF-modified Similarity and Complex Network in Multi-document Summarization

  • Tu, Yi-Ning;Hsu, Wei-Tse
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.9 no.3
    • /
    • pp.23-41
    • /
    • 2019
  • This study combines the concept of degree centrality in complex network with the Term Frequency $^*$ Proportional Document Frequency ($TF^*PDF$) algorithm; the combined method, called PMCN (PDF-Modified similarity and Complex Network), constructs relationship networks among sentences for writing news summaries. The PMCN method is a multi-document summarization extension of the ideas of Bun and Ishizuka (2002), who first published the $TF^*PDF$ algorithm for detecting hot topics. In their $TF^*PDF$ algorithm, Bun and Ishizuka defined the publisher of a news item as its channel. If the PDF weight of a term is higher than the weights of other terms, then the term is hotter than the other terms. However, this study attempts to develop summaries for news items. Because the $TF^*PDF$ algorithm summarizes daily news, PMCN replaces the concept of "channel" with "the date of the news event", and uses the resulting chronicle ordering for a multi-document summarization algorithm, of which the F-measure scores were 0.042 and 0.051 higher than LexRank for the famous d30001t and d30003t tasks, respectively.

Flow based Sequential Grouping System for Malicious Traffic Detection

  • Park, Jee-Tae;Baek, Ui-Jun;Lee, Min-Seong;Goo, Young-Hoon;Lee, Sung-Ho;Kim, Myung-Sup
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3771-3792
    • /
    • 2021
  • With the rapid development of science and technology, several high-performance networks have emerged with various new applications. Consequently, financially or socially motivated attacks on specific networks have also steadily become more complicated and sophisticated. To reduce the damage caused by such attacks, administration of network traffic flow in real-time and precise analysis of past attack traffic have become imperative. Although various traffic analysis methods have been studied recently, they continue to suffer from performance limitations and are generally too complicated to apply in existing systems. To address this problem, we propose a method to calculate the correlation between the malicious and normal flows and classify attack traffics based on the corresponding correlation values. In order to evaluate the performance of the proposed method, we conducted several experiments using examples of real malicious traffic and normal traffic. The evaluation was performed with respect to three metrics: recall, precision, and f-measure. The experimental results verified high performance of the proposed method with respect to first two metrics.

Determinant Factors of the Performance of Higher Institutions in Indonesia

  • YUMHI, Yumhi;MARTOYO, Dwi;TUNNUFUS, Zakiyya;TIMOTIUS, Elkana
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.2
    • /
    • pp.667-673
    • /
    • 2021
  • This causal quantitative research aims to investigate the influence of factors that determine the performance of employees in Indonesian universities. The factors are crucial for organizations in the achievement of their goals. Based on theoretical studies, three independent variables, namely, training, personality, and work motivation were tested for their influence on employee performance, which was the dependent variable. Primary data were obtained from 94 respondents of a total population of 122 individuals at the Education Quality Assurance Institute (LPMP) in Banten Province, Indonesia. They were tested by the normality test using the Kolmogorov-Smirnov approach to ensure their normally distributed population and the linearity test to measure the significant linear relationship between the two variables. There are five hypotheses in this study. Each hypothesis tested by the F-test to determine the significant effect of all independent variables on the dependent variable, and t-test to analyze the effect. The results of this study answered all hypotheses of the research model. There is a positive direct effect of training and personality on work motivation. Both training and personality also affect positively employee performance. Another finding of this study is that employee performance is positively and directly affected by work motivation.

Analysis of EEG Signal Differences in Gender according to Textile Attachments (섬유 애착물의 종류에 따른 남녀 뇌파 신호 차이 분석)

  • Lee, Okkyung;Lee, Yejin
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.5
    • /
    • pp.824-836
    • /
    • 2020
  • This study investigated the effects of textile attachments on electroencephalogram using 20 persons (10 males and 10 females). Four types of attachment cushions were manufactured by changing the shell fabric (cotton and microfiber) and interlining (synthetic loose fiber and buckwheat). This was done using BIOS-S8 (BioBrain Inc., Korea), an 8-channel polygraph for multi-body signal measurement, to measure EEG. Data were analyzed using the SPSS 24.0 statistical program. EEG values were significantly activated according to gender, particularly when the subjects' eyes were open. For the male cases, 'RT', 'RAHB' values were highly activated and for the female cases, 'RA', 'RB', 'RG', 'RFA', 'RST', 'RLB', 'RMB', 'RST', 'RMT' values were highly activated. Examining the differences in EEG according to type of attachment indicated no significant difference in both sexes. However, in cases of females with their eyes closed, the 'RSA' index was quite different in the left occipital lobe (O1), and when their eyes were open, the 'RFA' in the right frontal lobe (F4) showed a significant difference. However, there was no obvious correlation between the activation of EEG and the subjective preference of textile attachments.

Improved Feature Selection Techniques for Image Retrieval based on Metaheuristic Optimization

  • Johari, Punit Kumar;Gupta, Rajendra Kumar
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.40-48
    • /
    • 2021
  • Content-Based Image Retrieval (CBIR) system plays a vital role to retrieve the relevant images as per the user perception from the huge database is a challenging task. Images are represented is to employ a combination of low-level features as per their visual content to form a feature vector. To reduce the search time of a large database while retrieving images, a novel image retrieval technique based on feature dimensionality reduction is being proposed with the exploit of metaheuristic optimization techniques based on Genetic Algorithm (GA), Extended Binary Cuckoo Search (EBCS) and Whale Optimization Algorithm (WOA). Each image in the database is indexed using a feature vector comprising of fuzzified based color histogram descriptor for color and Median binary pattern were derived in the color space from HSI for texture feature variants respectively. Finally, results are being compared in terms of Precision, Recall, F-measure, Accuracy, and error rate with benchmark classification algorithms (Linear discriminant analysis, CatBoost, Extra Trees, Random Forest, Naive Bayes, light gradient boosting, Extreme gradient boosting, k-NN, and Ridge) to validate the efficiency of the proposed approach. Finally, a ranking of the techniques using TOPSIS has been considered choosing the best feature selection technique based on different model parameters.