Background: Code refactoring is widely recognized as an essential software engineering practice to improve the understandability and maintainability of source code. Several studies attempted to detect refactoring activities through mining software repositories, allowing one to collect, analyze and get actionable data-driven insights about refactoring practices within software projects. Aim: We aim to identify, among the various quality models presented in the literature, the ones that align with the developer's vision of eliminating duplicates of code, when they explicitly mention that they refactor the code to improve them. Method: We extract a large corpus of 2,164,797 refactoring commits that are applied and documented by developers during their daily changes from 128 open-source Java projects. In particular, we extract 12 structural metrics from which we identify code duplicates removal commits with their corresponding refactoring operations, as perceived by software engineers. Thereafter, we empirically analyze the impact of these refactoring operations on a set of common state-of-the-art design quality metrics. Results: The statistical analysis of the obtained results shows that (i) some state-of-the-art metrics are able to capture developer's intention of removing code duplication; and (ii) some metrics are being more emphasized than others. Conclusions: We confirm that various structural metrics can effectively represent code duplication, leading to different impacts on software quality. Some metrics contribute to improvements, while others may lead to degradation. Most of the mapped metrics associated with the main quality attributes successfully capture developers' intentions for removing code duplicates, as is evident from the commit messages. However, certain metrics do not fully capture these intentions.
If you are interested to learn more about the process we followed, please refer to our paper.