How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation


Refactoring, as coined by William Obdyke in 1992, is the art of optimizing the syntactic design of a software system without altering its external behavior. Refactoring was also cataloged by Martin Fowler as a response to the existence of defects that negatively impact software design. Since then, a significant amount of research in refactoring has presumed that refactoring is primarily motivated by the need to improve system structures. However, recent studies have shown that developers may incorporate refactoring strategies in other development-related activities that go beyond improving the design. Unfortunately, these studies are limited to developer interviews and a reduced set of projects. To cope with the above-mentioned limitations, we aim to better understand what motivates developers to apply a refactoring by mining and automatically classifying 111,884 commits containing refactoring activities, extracted from 800 open source Java projects. We trained a multi-class classifier to categorize these commits into 3 novel categories, namely, Internal Quality Attributes, External Quality Attributes, and Code Smells, along with the traditional Bug Fix and Functional categories. Such classification enables to quantification of how much each of these categories trigger refactoring activities in general, and challenges the original definition of refactoring, being exclusive to improving software design and fixing code smells. Furthermore, to better understand our classification results, we qualitatively analyzed commit messages to extract textual patterns that developers regularly use to describe their refactoring activities. The results of our empirical investigation show that (1) fixing code smells is not the main driver for developers to refactoring their code bases. As explicitly mentioned by the developers in their commits messages, refactoring is solicited for a wide variety of reasons, going beyond its traditional definition, such as reducing the software’s proneness to bugs, easing the addition of functionality, resolving lexical ambiguity, enforcing code styling, and improving the design’s testability and reusability; (2) the distribution of refactoring operations differ between production and test files. Operations undertaking production is significantly larger than operations applied to test files; (3) developers use a variety of patterns to purposefully target refactoring-related activities; (4) developers occasionally explicitly mention the motivation behind their refactoring strategies; (5) the textual patterns, which we extracted in this paper, provide a better coverage for how developers document their refactorings.


More specifically, the research questions that we investigated are:

RQ1. What is the purpose of the applied refactorings?

While previous surveys analyze how developers apply refactorings in varying development contexts, none of them have measured the ubiquity of these varying contexts in practice. Therefore, we quantify the distribution of refactorings performed in varying development contexts to augment our understanding of refactorings in theory vs. practice.

RQ2. Do software developers perform different types of refactoring operations on test code and production code between categories?

This question further explores the findings of the classification to see to what extent developers refactor production files differently from test files.

RQ3. What patterns do developers use to describe their refactoring activities?

Since there is no consensus on how to formally document the act of refactoring code, we intend to extract (from commit messages) words and phrases commonly used to document refactoring. Such information is useful from many perspectives: it allows the understanding of the rationale behind the applied refactorings, e.g., fixing code smells or improving specific quality attributes. Also, it reveals what refactoring operations tend to be typically documented, and whether developers explicitly mention them as part of their documentation. Little is known about how developers document refactoring as previous studies mainly rely on the keyword refactor to annotate such documentation.

RQ4.Do commits containing the label Refactor indicate more refactoring activity than those without the label?

We revisit the hypothesis raised by Murphy-Hill et al. about whether developers use a specific pattern, i.e., “refactor” when describing their refactoring activities.


If you are interested to learn more about the process we followed, please refer to our paper.


Related Paper

Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian Newman, Ali Ouni, and Marouane Kessentini "How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation", the Expert Systems with Applications (ESWA'2020). [preprint]