duppr_analysis/0_abstract.tex

\begin{abstract}
OSS projects are being developed by globally distributed contributors, who often collaborate through the pull-based model today.
While this model lowers the barrier to entry for OSS developers by synthesizing, automating and optimizing the contribution process,
coordination among an increasing number of contributors remains as a challenge
due to the asynchronous and self-organized nature of distributed development.
In particular, duplicate contributions, where multiple different contributors unintentionally submit duplicate pull requests to achieve the same goal,
are an elusive problem that may waste effort in automated testing, code review and software maintenance.
While the issue of duplicate pull requests has been highlighted, to what extent duplicate pull requests affect the development in OSS communities has not been well investigated.
In this paper, we conduct a mixed-approach study to bridge this gap.
Based on a comprehensive dataset constructed from 26 popular GitHub projects, we obtain the following findings:
(a)~Duplicate pull requests result in redundant human and computing resources, exerting a significant impact on the contribution and evaluation process.
(b)~Contributors' inappropriate working patterns and unawareness of other contributors and their activities might result in duplicate pull requests.
(c)~Compared to non-duplicate pull requests, duplicate pull requests have significantly different features,
\textit{i.e.}, being submitted by inexperienced contributors, being fixing bugs,
 introducing more code modifications, touching cold files, solving tracked issues, and changing source code files.
(d)~Integrators choosing between duplicate pull requests prefer to accept those with accurate and high-quality implementation, broad coverage, test code, deep discussion, active response, and early submission time.
Finally, actionable suggestions and implications are proposed for OSS practitioners.

\end{abstract}