Toward the Automatic Classification of Self-Affirmed Refactoring

Research Questions and Findings

RQ1. Is it possible to accurately perform two-class and multiclass SAR classification using our machine learning technique?

We find that our approach is accurately identifying the SAR patterns and the three common quality improvements with an F1- measure of 98% and 93% for the two-class and multiclass classification problems, respectively.

RQ2. How effective is our machine learning approach in classifying SAR?

We find that our approach can effectively outperform the classification over the current state-of-the-art baselines. We achieved an F1-measure of 98% when identifying SAR commits (an average improvement of 1.6 x and 1.84 x over the state of the art approaches), and an F1-measure of 93% when identifying the common quality improvement categories (an average improvement of 1.10 x and 22.14 x over the state of the art approaches). Additionally, our approach identifies more patterns that complement the list of manually identified 87 SAR patterns.

RQ3. How much training dataset is needed to effectively classify self-affirmed refactoring?

We find that to achieve a performance equivalent to 90% of the high F1-measure score, only one fold of commit messages is required for the two-class and multiclass classfication problems, respectively.

Binary Classification

Multiclass Classification