Software Quality Modeling and Estimation with Missing Data
Volume 4, Number 1, January 2008 - Paper 1 - pp. 5 - 18
NAEEM SELIYA1 , TAGHI M. KHOSHGOFTAAR21Computer and Information Science, University of Michigan – Dearborn 4901 Evergreen Road, Dearborn, MI 48128, USA, January 2008
2Computer Science and Engineering, Florida Atlantic University 777 Glades Road, Boca Raton, FL 33431, USA
(Received on March 15, 2007)
Software quality estimation models generally exploit the software engineering measurements hypothesis that software metrics encapsulate the underlying quality of the software system. A typical model is trained using software measurements and fault data of a similar, previously developed project. Such a strategy requires complete knowledge of fault data for all of the training modules. However, various practical software engineering issues limit the availability of fault data for all modules in the training data. We present a semi-supervised learning scheme as a solution to software defect modeling when there is limited prior knowledge of software quality. The commonly used EM algorithm for estimating missing data values is used in conjunction with k-means clustering. An empirical investigation using software measurement and defect data from real world projects demonstrates the effectiveness and viability of the proposed method. It is shown that estimation accuracy of the defect prediction model after the semi-supervised learning process is generally better compared to a defect prediction model trained with a dataset consisting of (only available) program modules with known number of faults.
Click here to download the paper.
Please note : You will need Adobe Acrobat viewer to view the full articles.