Cross-project Reopened Pull Request Prediction in GitHub
EasyChair Preprint 2992
6 pages•Date: March 19, 2020Abstract
In GitHub, pull requests may get reopened again for further modification and code review. Prediction of within-project
reopened pull requests work well if there is enough amount of training data to build the training model. However, for new projects that have a limited amount of pull requests, using training data from other projects can help to predict the reopened pull requests. Therefore, it is important to study cross-project reopened pull request
prediction and help integrators in new projects. In this paper, we propose a cross-project approach that consists of
building a decision tree training model based on an external project as a source project to predict the reopened pull requests in another project. We evaluate the effectiveness of cross-project prediction on 7 open source projects containing 100,622 pull requests. Experiment results show that the cross-project prediction achieves accuracy from
78.76% to 96.52%, and F1-measure from 53.34% to 90.58% across 7 projects. We examine the feature importance using the decision tree predictor and find that the number of commits is the most important feature in the majority of projects.
Keyphrases: GitHub, Reopened pull request prediction, cross-project