Defect Prediction on the Hardware Repository - A Case Study on the OpenRISC1000 Project
MetadataShow full item record
Software defect prediction is one of the most active research topics in the area of mining software engineering data. The software engineering data sources like the code repositories and the bug databases contain rich information about software development history. Mining these data can guide software developers for future development activities and help managers to improve the development process. Nowadays, the computer-engineering field has rapidly evolved from 1972 until present times to the modern chip design, which looks superficially and very much like software design. Hence, the main objective of this thesis is to check whether it would be possible to apply software defect prediction techniques on hardware repositories. In this thesis, we have applied various data mining methods (e.g., linear regression, logistic regression, random forests, and entropy) to predict the post-release bugs of OpenRISC 1000 projects. We have conducted two types of studies: classification (predicting buggy and non-buggy files) and ranking (predicting the buggiest files). In particular, the classification studies show promising results with an average precision and recall of up to 74% and 70% for projects written in Verilog and close to 100% for projects written in C.