• Title/Summary/Keyword: Predictive Analysis

Search Result 2,070, Processing Time 0.031 seconds

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

  • Malhotra, Ruchika;Sharma, Anjali
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.751-770
    • /
    • 2018
  • Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

Predicting Economic Activity via the Yield Spread: Literature Survey and Empirical Evidence in Korea (이자율 스프레드의 경기 예측력: 문헌 서베이 및 한국의 사례 분석)

  • Yun, Jaeho
    • Economic Analysis
    • /
    • v.26 no.3
    • /
    • pp.1-47
    • /
    • 2020
  • This paper surveys research since the 1990s on the ability of the yield spread and its components (i.e., expectation spread and term premium components) for future economic activity, and also conducts an empirical analysis of their forecasting ability using the yield data of Korean government bonds. This paper's survey, particularly for the US, shows that the yield spread has significant predictive power for some macroeconomic variables, but since the mid-1980s, its predictive power seems to have declined, possibly due to stronger inflation targeting. Next, this paper's empirical analysis using Korean data indicates that the yield spread, and the term premium component in particular, has significant predictive power for industrial production (IP) growth, consumer price index growth, and the IP gap. An out-of-sample analysis shows that the prediction equations are unstable over time, and that in predicting IP growth, the yield spread decomposition makes a significant contribution to the prediction of IP growth.

Landslide Risk Assessment of Cropland and Man-made Infrastructures using Bayesian Predictive Model (베이지안 예측모델을 활용한 농업 및 인공 인프라의 산사태 재해 위험 평가)

  • Al, Mamun;Jang, Dong-Ho
    • Journal of The Geomorphological Association of Korea
    • /
    • v.27 no.3
    • /
    • pp.87-103
    • /
    • 2020
  • The purpose of this study is to evaluate the risk of cropland and man-made infrastructures in a landslide-prone area using a GIS-based method. To achieve this goal, a landslide inventory map was prepared based on aerial photograph analysis as well as field observations. A total of 550 landslides have been counted in the entire study area. For model analysis and validation, extracted landslides were randomly selected and divided into two groups. The landslide causative factors such as slope, aspect, curvature, topographic wetness index, elevation, forest type, forest crown density, geology, land-use, soil drainage, and soil texture were used in the analysis. Moreover, to identify the correlation between landslides and causative factors, pixels were divided into several classes and frequency ratio was also extracted. A landslide susceptibility map was constructed using a bayesian predictive model (BPM) based on the entire events. In the cross validation process, the landslide susceptibility map as well as observation data were plotted with a receiver operating characteristic (ROC) curve then the area under the curve (AUC) was calculated and tried to extract a success rate curve. The results showed that, the BPM produced 85.8% accuracy. We believed that the model was acceptable for the landslide susceptibility analysis of the study area. In addition, for risk assessment, monetary value (local) and vulnerability scale were added for each social thematic data layers, which were then converted into US dollar considering landslide occurrence time. Moreover, the total number of the study area pixels and predictive landslide affected pixels were considered for making a probability table. Matching with the affected number, 5,000 landslide pixels were assumed to run for final calculation. Based on the result, cropland showed the estimated total risk as US $ 35.4 million and man-made infrastructure risk amounted to US $ 39.3 million.

Comparison of Machine Learning Analysis on Predictive Factors of Children's Planning-Organizing Executive Function by Income Level: Through Home Environment Quality and Wealth Factors

  • Lim, Hye-Kyung;Kim, Hyun-Ok;Park, Hae-Seon
    • Journal of People, Plants, and Environment
    • /
    • v.24 no.6
    • /
    • pp.651-662
    • /
    • 2021
  • Background and objective: This study identifies whether children's planning-organizing executive function can be significantly classified and predicted by home environment quality and wealth factors. Methods: For empirical analysis, we used the data collected from the 10th Panel Study on Korean Children in 2017. Using machine learning tools such as support vector machine (SVM) and random forest (RF), we evaluated the accuracy of the model in which home environment factors classify and predict children's planning-organizing executive functions, and extract the relative importance of variables that determine these executive functions by income group. Results: First, SVM analysis shows that home environment quality and wealth factors show high accuracy in classification and prediction in all three groups. Second, RF analysis shows that estate had the highest predictive power in the high-income group, followed by income, asset, learning, reinforcement, and emotional environment. In the middle-income group, emotional environment showed the highest score, followed by estate, asset, reinforcement, and income. In the low-income group, estate showed the highest score, followed by income, asset, learning, reinforcement, and emotional environment. Conclusion: This study confirmed that home environment quality and wealth factors are significant factors in predicting children's planning-organizing executive functions.

Comparative Molecular Field Analysis of Caspase-3 Inhibitors

  • Sathya, B.;Madhavan, Thirumurthy
    • Journal of Integrative Natural Science
    • /
    • v.7 no.3
    • /
    • pp.166-172
    • /
    • 2014
  • Caspases, a family of cysteinyl aspartate-specific proteases plays a central role in the regulation and the execution of apoptotic cell death. Activation of caspases-3 stimulates a signaling pathway that ultimately leads to the death of the cell. Hence, caspase-3 has been proven to be an effective target for reducing the amount of cellular and tissue damage. In this work, comparative molecular field analysis (CoMFA) was performed on a series of 3, 4-dihydropyrimidoindolones derivatives which are inhibitors of caspase-3. The best predictions were obtained for CoMFA model ($q^2=0.676$, $r^2=0.990$). The predictive ability of test set ($r^2_{pred}$) was 0.688. Statistical parameters from the generated QSAR models indicated the data is well fitted and have high predictive ability. Our theoretical results could be useful to design novel and more potent caspase-3 derivatives.

Prediction of Paroxysmal Atrial Fibrillation using Time-domain Analysis and Random Forest

  • Lee, Seung-Hwan;Kang, Dong-Won;Lee, Kyoung-Joung
    • Journal of Biomedical Engineering Research
    • /
    • v.39 no.2
    • /
    • pp.69-79
    • /
    • 2018
  • The present study proposes an algorithm that can discriminate between normal subjects and paroxysmal atrial fibrillation (PAF) patients, which is conducted using electrocardiogram (ECG) without PAF events. For this, time-domain features and random forest classifier are used. Time-domain features are obtained from Poincare plot, Lorenz plot of ${\delta}RR$ interval, and morphology analysis. Afterward, three features are selected in total through feature selection. PAF patients and normal subjects are classified using random forest. The classification result showed that sensitivity and specificity were 81.82% and 95.24% respectively, the positive predictive value and negative predictive value were 96.43% and 76.92% respectively, and accuracy was 87.04%. The proposed algorithm had an advantage in terms of the computation requirement compared to existing algorithm, so it has suggested applicability in the more efficient prediction of PAF.

VHDL Implementation of an LPC Analysis Algorithm (LPC 분석 알고리즘의 VHDL 구현)

  • 선우명훈;조위덕
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.1
    • /
    • pp.96-102
    • /
    • 1995
  • This paper presents the VHSIC Hardware Description Language(VHDL) implementation of the Fixed Point Covariance Lattice(FLAT) algorithm for an Linear Predictive Coding(LPC) analysis and its related algorithms, such as the forth order high pass Infinite Impulse Response(IIR) filter, covariance matrix calculation, and Spectral Smoothing Technique(SST) in the Vector Sum Exited Linear Predictive(VSELP) speech coder that has been Selected as the standard speech coder for the North America and Japanese digital cellular. Existing Digital Signal Processor(DSP) chips used in digital cellular phones are derived from general purpose DSP chips, and thus, these DSP chips may not be optimal and effective architectures are to be designed for the above mentioned algorithms. Then we implemented the VHDL code based on the C code, Finally, we verified that VHDL results are the same as C code results for real speech data. The implemented VHDL code can be used for performing logic synthesis and for designing an LPC Application Specific Integrated Circuit(ASOC) chip and DsP chips. We first developed the C language code to investigate the correctness of algorithms and to compare C code results with VHDL code results block by block.

  • PDF

Racial and Social Economic Factors Impact on the Cause Specific Survival of Pancreatic Cancer: A SEER Survey

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.1
    • /
    • pp.159-163
    • /
    • 2013
  • Background: This study used Surveillance, Epidemiology and End Results (SEER) pancreatic cancer data to identify predictive models and potential socio-economic disparities in pancreatic cancer outcome. Materials and Methods: For risk modeling, Kaplan Meier method was used for cause specific survival analysis. The Kolmogorov-Smirnov's test was used to compare survival curves. The Cox proportional hazard method was applied for multivariate analysis. The area under the ROC curve was computed for predictors of absolute risk of death, optimized to improve efficiency. Results: This study included 58,747 patients. The mean follow up time (S.D.) was 7.6 (10.6) months. SEER stage and grade were strongly predictive univariates. Sex, race, and three socio-economic factors (county level family income, rural-urban residence status, and county level education attainment) were independent multivariate predictors. Racial and socio-economic factors were associated with about 2% difference in absolute cause specific survival. Conclusions: This study s found significant effects of socio-economic factors on pancreas cancer outcome. These data may generate hypotheses for trials to eliminate these outcome disparities.

Speech Quality Measure in a Mobile Communication System using PLP Cepstral Distance with CMS (심리 음향 겝스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가)

  • 윤종진;박상욱;박영철;안동순;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.12B
    • /
    • pp.2046-2051
    • /
    • 2000
  • 본 논문에서는 기존의 음질 평가 방법들보다 우수할 뿐 아니라 다양한 채널 경로의 음성 신호에 대해서도 일관된 성능을 갖는 새로운 음질 평가 방법 PLP-CMS(Perceptual Linear Predictive-Cepstral Mean Subtraction)를 제안한다. CDMA PCS 이동 전화 환경에서 음성 신호의 주관적 음질을 효과적으로 예측할 수 있는 PLP-CMS는 심리 음향 선형 예측 분석(PLP Analysis: Perceptual Linear Predictive Analysis)을 이용하여 주관적 음질과의 상관 관계를 높였으며, 겝스트럼 평균 차감(CMS: Cepstral Mean Subtraction) 과정을 통하여 PSTN 경로에 무관하게 일관된 성능을 갖음을 확인하였다.

  • PDF

DEVELOPMENT OF A HYBRID CFD FRAMEDWORK FOR MULTI-PHENOMENA FLOW ANALYSIS AND DESIGN (다중현상 유동 해석 및 설계를 위한 융복합 프레임웍 개발)

  • Hur, Nahm-Keon
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2010.05a
    • /
    • pp.517-523
    • /
    • 2010
  • Recently, the rapid evolution of computational fluid dynamics (CFD) has enabled its key role in industries and predictive sciences. From diverse research disciplines, however, are there strong needs for integrated analytical tools for multi-phenomena beyond simple flow simulation. Based on the concurrent simulation of multi-dynamics, multi-phenomena beyond simple flow simulation. Based on the concurrent simulation of multi-dynamics, multi-physics and multi-scale phenomena, the multi-phenomena CFD technology enables us to perform the flow simulation for integrated and complex systems. From the multi-phenomena CFD analysis, the high-precision analytical and predictive capacity can enhance the fast development of industrial technologies. It is also expected to further enhance the applicability of the simulation technique to medical and bio technology, new and renewable energy, nanotechnology, and scientific computing, among others.

  • PDF