Welcome to the Self-Affirmed Refactoring (SAR) Project webpage!

SAR refers to developers documentation if their refactoring activities. SAR is key to understanding various aspects of refactoring, including the motivation, and procedure, and the consequences of the performed code change. All are documented by the code authors themselves.

This website contains all the research related to SAR. You will find all papers, artifacts, datasets, and scripts we publicly share for replication and extension. If you have questions please email us at ealomar@stevens.edu or mwmvse@rit.edu

Feel free to visit our other research or projects: https://mkaouer.net/project/

Can Refactoring be Self-Affirmed? An Exploratory Study on How Developers Document their Refactoring Activities in Commit Messages

Published at the 3nd International Workshop on Refactoring (IWoR'2019)

In this empirical study, we examined refactoring activities to explore how developers document their refactoring activities during the software life-cycle in the history of 3,795 well-engineered, open-source projects.

More specifically, we investigated the following research questions:

RQ1. What patterns do developers use to describe their refactoring activities?

RQ2. What are the quality issues that drive developers to refactor?

RQ3. What are the top-10 patterns developers use to describe quality issues in their commits?

RQ4. Do Commits containing the label "Refactor" indicate more refactoring activity than those without the label?

On the Impact of Refactoring on the Relationship between Quality Attributes and Design Metrics

Published at the International Symposium on Empirical Software Engineering and Measurement (ESEM'2019)

Our empirical study focused on investigating whether the developer perception of quality improvement (as expected by developers) aligns with the real quality improvement (as assessed by quality metrics).

In particular, we addressed the following research question:

RQ. Do the developer perception of quality improvement align with the quantitative assessment of code quality?

Toward the Automatic Classiffcation of Self-Affirmed Refactoring

Published at the Journal of Systems and Software (JSS'2020)

Our study focused on automating the detection and classification of refactoring documentation in commit messages. We challenge our model using a total of 2867 commit messages extracted from well engineered open-source projects.

In particular, we addressed the following research questions:

RQ1. Is it possible to accurately perform two-class and multiclass SAR classification using our machine learning technique?

RQ2. How effective is our machine learning approach in classifying SAR?

RQ3. How much training dataset is needed to effectively classify self-affirmed refactoring?

How Do Developers Refactor Code to Improve Code Reusability?

Published at the International Conference on Software and Systems Reuse (ICSR'2020)

Our study investigated how developers use refactoring when they state they are improving code reusability. To better understand how developers perceive reusability and apply it in real-world scenarios, we examine how these refactorings manifest in the code by examining their impact on code quality.

In particular, we addressed the following research questions:

RQ1. Do developers refactor code differently for the purpose of improving reusability?

RQ2. What is the impact of reusability refactorings on structural metrics?

How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation

Published at Expert Systems with Applications journal (ESWA'2020)

In this paper, we performed a large-scale empirical study to explore the motivation driving refactorings, the documentation of refactoring activities, and the proportion of refactoring operations performed on production and test code.

In particular, we addressed the following research questions:

RQ1. What is the purpose of the applied refactorings?

RQ2. Do software developers perform different types of refactoring operations on test code and production code between categories?

RQ3. What patterns do developers use to describe their refactoring activities?

RQ4. Do commits containing the label Refactor indicate more refactoring activity than those without the label?

Mining and Managing Big Data Refactoring for Design Improvement: Are We There Yet?

Published at Knowledge Management in the Development of Data-Intensive Systems (KMDDIS'2020)

In this book chapter, we take a dive into how refactoring can be mined and preprocessed. We discuss all design concepts and structural metrics that can also be mined along with refactoring operations to understand their impact better. We further investigate the many practical challenges for such extraction.

Refactoring Practices in the Context of Modern Code Review: An Industrial Case Study at Xerox

Published at International Conference on Software Engineering (ICSE'2021)

The survey utilized in this study is available for download here.

Behind the Scenes: On the Relationship Between Developer Experience and Refactoring

Published at Journal of Software: Evolution and Process

In our study, we associate an experience score to each contributor in order to test various hypotheses related to whether developers with higher scores tend to 1) perform a higher number of refactoring operations 2) exhibit different motivations behind their refactoring, and 3) better document their refactoring activity.

In particular, we addressed the following research questions:

RQ1. What is the distribution of experience among developers that perform refactorings?

RQ2. Do developers with more contribution refactor code more often?

RQ3. What triggers developers to refactor the code?

RQ4. Does developer's experience influence the quality of refactoring documentation?

The dataset utilized in this study is available for download here.

Refactoring for Reuse: An Empirical Study

Published at Innovations in Systems and Software Engineering

In particular, we addressed the following research questions:

RQ1. Do developers refactor code differently for the purpose of improving reusability?

RQ2. What is the impact of reusability refactorings on structural metrics?

RQ3. What triggers developers to refactor the code for the purpose of code reuse?

On the Documentation of Refactoring Types

Published at Automated Software Engineering Journal

Our study explored the ability of refactoring documentation, written in commit messages, to adequately predict the refactoring types, performed in code changes at the commit level. We challenge our model using a total of 5004 commit messages extracted from well engineered open-source projects.

In particular, we addressed the following research questions:

RQ1. How effective is our supervised learning in predicting the type of refactoring?

RQ2. How do our model compare with keyword-based classification?

RQ3. What are the frequent terms utilized by developers when documenting refactoring types?

RQ4. How useful is our approach in alerting the inconsistency types between source code and documentation?

Code Review Practices for Refactoring Changes: An Empirical Study on OpenStack

Published at Mining Software Repositories Technical Track (MSR'2022)

Understanding the practice of refactoring code review is of paramount importance to the research community and industry. Although Modern Code Review (MCR) is widely adopted in open-source and industrial projects, the relationship between code review and refactoring practice remains largely unexplored. In this study, we performed a quantitative and qualitative study to investigate the challenges faced by developers when reviewing refactorings.

The dataset utilized in this study is available for download here.

An Exploratory Study on Refactoring Documentation in Issues Handling

Published at International Conference on Mining Software Repositories Mining Challenge (MSR'2022)

In this study, we aim at exploring developer-reported refactoring changes in issues to better understand what developers consider to be problematic in their code and how they handle it. Our approach relies on text mining 45,477 refactoring-related issues and identifying refactoring patterns from a diverse corpus of 77 Java projects by investigating issues associated with 15,833 refactoring operations and developers’ explicit refactoring intention.

In particular, we addressed the following research questions:

RQ1. What textual patterns do developers use to describe their refactoring needs in issues?

RQ2. What are the quality attributes developers care about when documenting in issues?

The dataset utilized in this study is available for download here.

On the Use of Static Analysis to Engage Students with Software Quality Improvement: An Experience with PMD

Published at International Conference on Software Engineering (ICSE'2023)

To increase the awareness of potential coding issues that violate coding standards, in this paper, we aim to reflect on our experience with teaching the use of static analysis for the purpose of evaluating its effectiveness in helping students with respect to improving software quality. This paper discusses the results of an experiment in the classroom, over a period of 3 academic semesters, involving 65 submissions that carried out code review activity of 690 rules using PMD

In particular, we addressed the following research questions:

RQ1. What problems are typically perceived by students as true positives versus false positives?

RQ2. What category of problems typically takes longer to be fixed?

RQ3. What is the perceived usefulness of PMD?

The dataset utilized in this study is available for download here.

How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations

Published at International Conference on Mining Software Repositories (MSR'2024)

In this paper, our goal is to explore con- versations between developers and ChatGPT related to refactoring to better understand how developers identify areas for improve- ment in code and how ChatGPT addresses developers’ needs.

In particular, we addressed the following research questions:

RQ1. What textual patterns do developers use to describe their refactoring needs using ChatGPT?

RQ2. What quality attributes does ChatGPT consider when describing refactoring?

RQ3. How do developers typically initiate conversations with ChatGPT when seeking guidance on refactoring tasks?

The dataset utilized in this study is available for download here.

Deciphering Refactoring Branch Dynamics in Modern Code Review: An Empirical Study on Qt

Published at Information and Software Technology (IST'2024)

The main goal of our study is to understand the practice of refactoring in the context of Modern Code Review (MCR) to characterize the criteria that influence decision making when reviewing refactoring changes.

In particular, we addressed the following research questions:

RQ1. How do refactoring reviews compare to non-refactoring reviews in terms of code review efforts?

RQ2. What textual patterns do developers use to describe their refactoring needs in the ‘Refactor’ branch?

RQ3. What quality attributes do developers consider when describing refactoring in the ‘Refactor’ branch?

RQ4. What topics do developers discuss when reviewing refactoring tasks?

The dataset utilized in this study is available for download here.

An Empirical Study on the Impact of Code Duplication-aware Refactoring Practices on Quality Metrics

Under review at Information and Software Technology (IST'2025)

The main goal of our study is to explore the alignment between developers’ per- ceptions of code duplicate removal (as anticipated by developers) and the actual improvement in software quality (as evaluated by quality metrics).

In particular, we addressed the following research questions:

RQ1. What is the quantitative code quality assessment of code duplica- tions that have been intentionally removed by developers?

RQ2. What are the refactoring operations associated with code duplicate removal?

The dataset utilized in this study is available for download here.

PhD defended!

MSR 2022

MSR 2022

Welcome to the Self-Affirmed Refactoring (SAR) Project webpage!

Can Refactoring be Self-Affirmed? An Exploratory Study on How Developers Document their Refactoring Activities in Commit Messages

On the Impact of Refactoring on the Relationship between Quality Attributes and Design Metrics

Toward the Automatic Classiffcation of Self-Affirmed Refactoring

How Do Developers Refactor Code to Improve Code Reusability?

How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation

Mining and Managing Big Data Refactoring for Design Improvement: Are We There Yet?

Refactoring Practices in the Context of Modern Code Review: An Industrial Case Study at Xerox

Behind the Scenes: On the Relationship Between Developer Experience and Refactoring

Refactoring for Reuse: An Empirical Study

On the Documentation of Refactoring Types

Code Review Practices for Refactoring Changes: An Empirical Study on OpenStack

An Exploratory Study on Refactoring Documentation in Issues Handling

On the Use of Static Analysis to Engage Students with Software Quality Improvement: An Experience with PMD

How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations

Deciphering Refactoring Branch Dynamics in Modern Code Review: An Empirical Study on Qt

An Empirical Study on the Impact of Code Duplication-aware Refactoring Practices on Quality Metrics