Software Fault Prediction Using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization

S. Kaliraj, A. M. Kishoore, V. Sivakumar

Research output: Contribution to journalArticlepeer-review


Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.

Original languageEnglish
Pages (from-to)64212-64227
Number of pages16
JournalIEEE Access
Publication statusPublished - 2024

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering


Dive into the research topics of 'Software Fault Prediction Using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization'. Together they form a unique fingerprint.

Cite this