In this study, we propose an approach to challenges the ability of refactoring documentation, written in commit messages, to adequately predict the refactoring types, performed in code changes at the commit level. Specifically, we combine the N-Gram TF-IDF feature selection with multiclass classifiers to build our model. We challenge our model using a total of 5004 commit messages extracted from well engineered open-source projects.
More specifically, the research questions that we investigated are:
RQ1. How effective is our supervised learning in predicting the type of refactoring?
We performed an automatic approach to predict the refactoring types to determine if the classification using machine learning techniques can result in high accuracy.
RQ2. How do our model compare with keyword-based classification?
Answering this research question would shed light on whether the prediction of refactoring types is a learning problem. We hypothesize that if learning algorithms cannot outperform a string matching algorithm, then there is no need for proposing such framework.
RQ3. What are the frequent terms utilized by developers when documenting refactoring types?
This research question examines the textual content of the commit messages to determine the frequent refactoring types-related terminology developers utilize when documenting their refactoring activity.
RQ4. How useful is our approach in analyzing the inconsistency types between source code and documentation?
Because inconsistencies between source code and its documentation can affect software comprehensibility and maintainability, this research question aims at exploring the frequency of different inconsistency types that might help in reporting any early inconsistency between refactoring types detected by refactoring detector tools and their documentation.
If you are interested to learn more about the process we followed, please refer to our paper.
Eman Abdullah AlOmar, Jiaqian Liu, Kenneth Addo, Mohamed Wiem Mkaouer, Christian Newman, and Ali Ouni, "On the documentation of refactoring types", the Automated Software Engineering journal (ASEj'2021). [preprint]