This website is dedicated to Expansive Insights: Analyzing Developer Conversations with ChatGPT.
Our work focuses on the dynamic interactions between developers and ChatGPT, examining how developers use this AI tool in real-world software engineering scenarios. The insights gathered from our study serve as a foundation for understanding developer behaviors, prompt patterns, and multi-turn conversational strategies with ChatGPT.
This project curates a rich dataset of developer conversations with ChatGPT, designed to help researchers and practitioners understand and leverage AI in software engineering. This website was designed by Matheus Paixao, with content contributions from Pulkit Saxena, Nikhil Harkangi and Utkarsh Dabholkar as part of their MS Capstone project at Rochester Institute of Technology, under the guidance of Dr. Mohamed Wiem Mkaouer.
By exploring the data and findings in Expansive Insights, software engineering researchers and AI practitioners can gain a deeper understanding of how ChatGPT assists developers. The insights derived from our study can aid in advancing AI-driven development tools, improving prompt engineering practices, and refining conversational models for software engineering tasks. Here, you will find a comprehensive overview of our research questions, data, and findings. You can also download the dataset and explore our methodology.
EMRS contains links to tools for multiple Extract refactoring techniques. All tools can be access through the below table containing links. The following table elaborates on the details concerning utilized artifacts by each project.
Tool | Language | No of Metric | Interface | Usage Guide? | Tool Link | Last Update |
---|---|---|---|---|---|---|
Tuck | Unknown | Unknown | Unknown | No | No | Unknown |
CloRT | Java | N/A | Unknown | No | No | Unknown |
Nate | Java | Unknown | Eclipse | No | No | Unknown |
CCshaper | Java | 6 | Command Line | No | No | Unknown |
Aries | Java | 6 | GUI-based | No | No | Unknown |
SDAR | Java | N/A | Eclipse | No | No | Unknown |
Unnamed | Java | N/A | Eclipse | No | No | Unknown |
Xrefactory | C++ | N/A | Unknown | Yes | Yes | 2007 |
Unnamed | Ruby | N/A | Eclipse | Yes | Yes | 2012 |
Refactoring Annotation | Java | Unknown | Eclipse | No | No | Unknown |
JDeodorant | Java | 3 | IntelliJ/ Eclipse | Yes | Yes | 2019 |
AutoMed | Java | 10 | Unknown | No | No | Unknown |
Wrangler | Erlang/OTP | N/A | GUI-based / Command line | Yes | Yes | 2023 |
HaRe | Haskell 98 | N/A | GUI-based / Command line | Yes | Yes | 2017 |
ReAF | Java | Unknown | Unknown | No | No | Unknown |
Unnamed | C# | Unknown | Visual Studio extension | No | No | Unknown |
CeDAR | Java | 2 | Eclipse | No | No | Unknown |
FTMPAT | Java | 3 | Eclipse | No | No | Unknown |
SPAPE | Procedural / Java | Unknown | Unknown | No | No | Unknown |
JExtract | Java | Unknown | Eclipse | Yes | Yes | 2016 |
DCRA | Java | 1 | Unknown | No | No | Unknown |
RASE | Java | N/A | Eclipse | Yes | Yes | 2015 |
SEMI | Java | 5 | GUI-based / Command line | Yes | Yes | 2017 |
GEMS | Java | 48 | Eclipse | Yes | No | 2017 |
PostponableRefactoring | Java | N/A | Eclipse | Yes | Yes | 2018 |
LLPM | Java | 4 | Unknown | No | No | Unknown |
PRI | Java | N/A | Eclipse | No | No | Unknown |
LMR | Java | 5 | Eclipse | No | No | Unknown |
CREC | Java | N/A | Eclipse | Yes | Yes | 2018 |
Bandago | Java | 4 | Eclipse | No | No | Unknown |
Unnamed | Java | N/A | Eclipse | No | Yes | 2019 |
Unnamed | Java | N/A | Unknown | No | No | Unknown |
CloneRefactor | Java | N/A | Command line | No | Yes | 2020 |
TOAD | Pharo | N/A | Pharo | Yes | Yes | 2019 |
Segmentation | Java | 2 | Eclipse | No | Yes | 2022 |
LiveRef | Java | 20 | IntelliJ | Yes | Yes | 2022 |
AntiCopyPaster | Java | 78 | IntelliJ | Yes | Yes | 2023 |
REM | Java | N/A | IntelliJ | Yes | Yes | 2023 |
In the above table, you will find links to directories containing csv, jsonl and zip files for the tools, raw data and datasets. For more information on the utilization of tools for Extract Method Refactoring, please review the relevant papers Paper Reviews section.
Expansive Insights examines how developers interact with ChatGPT for various tasks in software engineering. This study combines literature review, data collection, and an extensive analysis of developer prompts to uncover patterns and behaviors when using AI in development environments.
Aim: This research aims to bridge the gap in understanding how developers utilize ChatGPT in their workflows. We focus on identifying the most common types of inquiries,
patterns in multi-turn conversations, and the reasons developers engage in iterative exchanges with ChatGPT. By categorizing developer prompts and interactions, we provide insights
into ChatGPT’s role in enhancing productivity, code generation, and problem-solving.
Method: We curated a dataset from two sources, DevGPT and Kaggle, yielding 1,000 developer-ChatGPT conversations. Each conversation was meticulously labeled for inquiry type, and multi-turn exchanges were analyzed to understand conversational dynamics. Two primary research questions (RQs) guide our analysis: identifying initial prompt types (RQ1) and characterizing the structure and intent of multi-turn interactions (RQ2). Our analysis includes categorization into 16 prompt types and a further breakdown of multi-turn interaction into seven conversation types.
Results: The findings reveal that code generation (28.5%) is the most frequent inquiry type, highlighting developers' need for code assistance. Conceptual understanding and issue resolution follow as prominent use cases. In multi-turn conversations, developers frequently engage in iterative follow-ups (45.78%) to refine or expand responses, showing a need for continuous dialogue to clarify or enhance initial solutions. These insights emphasize the collaborative nature of developer-AI interactions, where developers rely on ChatGPT not only for answers but also for refinement and troubleshooting.
Conclusions: Our study sheds light on the nuanced ways developers interact with ChatGPT, offering practical insights for improving AI support in software development. By categorizing prompt types and conversational patterns, our findings serve as a resource for researchers and tool developers aiming to refine AI conversational models and better understand developer needs in real-world contexts.
We drive our study using the following research questions:
RQ1: Investigate if developers’ interactions with ChatGPT, as explored in the original paper, can be replicated and validated using an expanded dataset.
RQ2: Examine the reproducibility of the characteristics and prevalence of multi-turn conversations between developers and ChatGPT, by employing an expanded dataset.
RQ1. Investigate if developers’ interactions with ChatGPT, as explored in the original paper, can be replicated and validated using an expanded dataset.
Our analysis categorized developers' initial prompts into 16 types, highlighting Code Generation as the most frequent (28.5%), followed by Conceptual inquiries (17.8%) and Issue Resolving (14.8%). This indicates a high demand for code assistance and theoretical understanding. Additionally, our study identified patterns within code generation requests, with Feature Implementation and Integration as dominant subcategories, suggesting that developers frequently turn to ChatGPT for code that integrates smoothly into existing workflows. In the realm of issue resolution, prompts were often aimed at troubleshooting specific code errors and understanding why certain implementations failed. These findings emphasize ChatGPT’s primary role in helping developers with code generation, technical clarification, and debugging in software projects.
RQ2. Examine the reproducibility of the characteristics and prevalence of multi-turn conversations between developers and ChatGPT, by employing an expanded dataset.
Multi-turn conversations offer developers the opportunity for iterative refinement and clarification. Our findings reveal that Iterative Follow-up is the most common pattern in multi-turn interactions (45.78%), as developers often refine their initial prompts or build on ChatGPT’s responses to address complex challenges. Information Giving (10.87%) and Asking for Clarification (5.22%) were also prevalent, with developers providing additional context or seeking deeper understanding of ChatGPT's responses. This highlights ChatGPT's value not just as a static answer provider but as an interactive tool for refining and developing solutions through back-and-forth dialogue. The high frequency of follow-ups reflects a strong preference among developers for using ChatGPT as a collaborative partner in resolving intricate technical issues.
These findings underscore ChatGPT’s evolving role in software development, where developers rely on it for code generation, continuous engagement, and conversational adaptability to solve practical, real-world programming problems.
The following list elaborates on the papers surveyed for this study (Paper Links are clickable).
This is a list of publications that use the CROP dataset. If you have a published piece of work that uses CROP and it is not listed above, feel free to contact us. Your publication will be included in the list soon.
Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, Mark Harman. 2019. In IEEE Transactions on Software Engineering (TSE). Preprint
Luca Pascarella, Davide Spadini, Fabio Palomba, Alberto Bacchelli. 2020. On The Effect Of Code Review On Code Smells. In IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Preprint
Matheus Paixao, Paulo Henrique Maia. 2019. Rebasing in Code Review Considered Harmful:A Large-scale Empirical Investigation. In International Conference on Source Code Analysis and Manipulation (SCAM). Preprint
Matheus Paixao, Jens Krinke, DongGyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In International Conference on Mining Software Repositories (MSR). Preprint
You can contact the EMRS's team through the following channel: