internetware2017/0_abstract.tex

\begin{abstract}
% pr重要
The widespread use of pull-requests boosts the development
and evolution for many open source software projects.
% dup pr
However, due to the parallel and uncoordinated nature of development process in GitHub,
duplicate pull-requests may be submitted by different contributors to solve the same problem.
% dup pr 危害
Duplicate pull-requests increase the maintenance cost of GitHub,
result in the waste of time spent on the redundant effort of code review,
and even frustrate developers' willing to offer continuous contribution.
%我们的方案
In this paper,
we investigate using text information
to automatically detect duplicate pull-requests in GitHub.
For a new-arriving pull-request,
we compare the textual similarity between it and other existing pull-requests,
and then return a candidate list of the most similar ones.
%结果
We evaluate our approach on three popular projects hosted in GitHub,
namely Rails, Elasticsearch and Angular.JS.
%!!!!提我们怎么构造的测试集
The evaluation shows that about 55.3\% - 71.0\% of the duplicates can be found
when we use the combination of title similarity and description similarity.
\end{abstract}