Duplicate Pull Request Detection

Projects

List of open source projects pull requests used in this study.

Which classifier to use for fine-tuning BERT to better detect duplicate PRs?

How does BERT perform when trained on (1) titles only, (2) descriptions only and (3) titles and descriptions combined ?

How does the model compare to other models, namely Siamese-BERT, DC-CNN and Word2Vec?