32 lines
1.3 KiB
TeX
32 lines
1.3 KiB
TeX
|
|
\section{Conclusion}
|
|
\label{sec:Concl}
|
|
%方法
|
|
In this paper, we proposed an approach to automatically
|
|
detect duplicate pull-requests in GitHub.
|
|
Our method uses natural language text and diff information
|
|
to calculate the similarity between two pull-requests
|
|
and return a candidate list of the most similar one with the given pull-request.
|
|
%结果
|
|
We constructed a test dataset of duplicates through a semi-automatic way
|
|
from three popular projects hosted in GitHub including Rails, Elasticsearch and Angular.JS.
|
|
The evaluation result shows that
|
|
combining textual similarity and line-level diff similarity can achieve the best performance
|
|
which found about **\% - **\% of the duplicates
|
|
compared to **\% - **\% using only natural language text
|
|
and **\% - **\% using only diff information.
|
|
|
|
|
|
%未来工作
|
|
In the future, we plan to enrich our test dataset
|
|
and evaluate our method with datasets from more software projects.
|
|
In addition,
|
|
we would also develop better techniques to improve
|
|
the detection effectiveness of our method.
|
|
|
|
|
|
\section*{Acknowledgment}
|
|
% We would like to thank the anonymous reviewers for their comments.
|
|
The research is supported by the National Natural Science Foundation of China
|
|
(Grant No.61432020, 61303064, 61472430, 61502512)
|
|
and National Grand R\&D Plan (Grant No. 2016YFB1000805). |