Toward the Automatic Classification of Self-Affirmed Refactoring


In this study, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according to the refactoring common categories. Specifically, we combine the N-Gram TF-IDF feature selection with binary and multiclass classifiers to build our model. We challenge our model using a total of 2867 commit messages extracted from well engineered open-source projects.


More specifically, the research questions that we investigated are:

RQ1. Is it possible to accurately perform two-class and multiclass SAR classification using our machine learning technique?

We performed an automatic approach to classify SAR to determine if the classification using machine learning techniques can result in high accuracy.

RQ2. How effective is our machine learning approach in classifying SAR?

Answering this research question would shed light on whether the classification of SAR is a learning problem. We hypothesize that if learning algorithms cannot outperform a String matching algorithm, then there is no need for proposing such framework.

RQ3. How much training dataset is needed to effectively classify self-affirmed refactoring?

After assessing the accuracy of our approach in classifying SAR commits, we want to investigate the amount of training data that is needed to effectively classify SAR. Our approach will be easily extended if a small dataset can be used for SAR identification. On the other hand, if a large number of commits are required, then our approach requires considerable time and effort.


How to use our deployed Azure web service?


If you are interested to learn more about the process we followed, please refer to our paper.


Related Paper

Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, and Ali Ouni, "Toward the Automatic Classiffcation of Self-Affirmed Refactoring", the Journal of Systems and Software (JSS'2020). [preprint]