On the consistency of principal component analysis in software metrics
- Gal Lalouche, Ph.D. Thesis Seminar
- Wednesday, 11.1.2017, 13:30
- Taub 601
- Prof. J. Gil
Software metrics are used by software engineers to help gauge the health of their projects. Researchers hope to correlate easy to measure properties, such as lines of code, cyclomatic complexity, and the number of operators and operations, with external, harder to measure properties such as maintainability and proneness to bugs. Over the years, hundreds of metrics have been proposed; unsurprisngly, most metrics are correlated with the size of the code module. However, it isn't clear whether or not these metrics offer any substantial benefit over a simple size measurement, such as the number of lines of code. By using Principal Component Analysis (PCA), it is possible to extract orthogonal features from a large corpus of metrics. Previous research applied this process and identified such features as size, coupling and complexity. This work focuses on the reliability of these principal components, that is, whether measurements extracted from one code corpus are applicable in another. We show that other than the principal component that can be described as size, none of the extracted metrics are consistent with other metrics in other projects. Furthermore, the same also holds when consistency is measured between different versions of the same project. This result casts doubt on the validity of features other than size, since if these features are not reliable, since a feature that is not reliable cannot be valid.