some improvement according to review comments

This commit is contained in:
starlee 2018-03-10 10:30:47 +08:00
parent dd5849d702
commit 189e4e52be
5 changed files with 28 additions and 37 deletions

View File

@ -34,7 +34,7 @@ tend to lack information of others progress.
% 危害-平台
Duplicate PRs increase the maintenance cost of GitHub
and result in the waste of time spent on
the redundant effort of reviewing each of them separately.
the redundant effort of reviewing each of them separately~\cite{Gousios:2014,Gousios:2016}.
Moreover,
contributors may iteratively update and improve their PRs
in several rounds of code reviews~\cite{Yu:2015}
@ -70,14 +70,18 @@ Each pair of duplicate PRs in \textit{DupPR}
has been manually verified after automatic identification
which guarantees the quality of this dataset.
The dataset and the source code used to recreate it is available
online~\footnote{\url{https://github.com/whystar/MSR2018-DupPR}}.
online.~\footnote{\url{https://github.com/whystar/MSR2018-DupPR}}
Based on this dataset,
the following research would be more available.
\begin{itemize}
% \item Analyzing how much redundant effort would be wasted by duplicate PRs.
% This would give researchers a clear idea of the issues that duplicate PRs have introduced.
\item Analyzing how much redundant effort would be wasted by duplicate PRs.
This would give researchers a clear idea of the issues that duplicate PRs have introduced.
This would give researchers a straightforward impression
about how duplicate PRs negatively impact software development process.
\item Investigating how reviewers make the choice between two duplicate PRs.
This is necessary to build automatic tools to make more targeted comparison between duplicates

View File

@ -28,12 +28,12 @@ and the fields in these tables are defined as follows.
\begin{itemize}[leftmargin=0em,itemindent=2em]
\item Table \texttt{Project} stores the basic information of studied projects.
Field \texttt{user\_name} is the name of the user or the organization which owns the project in GitHub,
Field \texttt{user\_name} is the name of the user owning the project in GitHub,
and field \texttt{repo\_name} is the name of the project.
These two fields, together with the host of GitHub (\ie ``https://github.com/''),
These two fields, together with the domain name of GitHub,
can be used to compose the resource locator of the project in GitHub.
Other fields in table \texttt{Project} present some statistical characteristics of a project
such as \texttt{fork\_count} (the number of forks) and \texttt{star\_count} (the number of stars).
Other fields in table \texttt{Project} present some statistical characteristics of a project,
for example \texttt{fork\_count} is the number of forks.
\item For each project,
all the PRs belonged to it are stored in table \texttt{Pull-request}.

View File

@ -21,21 +21,12 @@ and the property \texttt{created\_at} of $idn\_cmt$ in table \texttt{Comment} is
We calculated the detection latency of all the duplicates in our dataset
and the statistic result is shown in Figure~\ref{fig:delay_time_bar}.
The figure presents that
1,474 (63\%) duplicates are identified less than one day,
while 865 (37.0\%) duplicates are detected after longer latency which is more than one day.
% \hl{This .....}
% 找特殊数据介绍一下
% Among them, 39 (1.7\%) duplicates are detected less than one minute,
% 681 (29.1\%) between one minue and one hour,
% and 754 (32.2\%) between one hour and one day.
% [39, 681, 754, 379, 239, 231, 16]
37.0\% (865) duplicates are detected after long latency which is more than one day.
% 1,474 (63\%) duplicates are identified less than one day,
% while 865 (37.0\%) duplicates are detected after longer latency which is more than one day.
% \hl{This .....}
% \begin{figure}[ht]
% \centering
% \includegraphics[width=0.5\textwidth]{figs/delay_time.png}
% \caption{Distribution of detection latency of duplicate pull-requests}
% \label{fig:delay_time}
% \end{figure}
\begin{figure}[ht]

Binary file not shown.

View File

@ -1,4 +1,4 @@
\documentclass[sigconf, anonymous]{acmart}
\documentclass[sigconf]{acmart}
\usepackage{booktabs} % For formal tables
\usepackage{setspace}
@ -27,20 +27,16 @@
\begin{document}
\title{A Dataset of Duplicate Pull-requests in GitHub}
% \author{Zhixing Li, Yue Yu$^*$, Gang Yin, Tao Wang, Huaimin Wang}
% \affiliation{%
% \institution{College of Computer, National University of Defense Technology}
% \city{Changsha, China}
% \postcode{410073}
% }
% \email{{lizhixing15, yuyue, yingang, taowang2005, hmwang}@nudt.edu.cn}
% \renewcommand{\shortauthors}{Z. Li et al.}
\author{Yue Yu$^*$, Zhixing Li$^*$, Gang Yin, Tao Wang, Huaimin Wang}
\affiliation{%
\institution{College of Computer, National University of Defense Technology}
\city{Changsha, China}
\postcode{410073}
}
\email{{yuyue, lizhixing15, yingang, taowang2005, hmwang}@nudt.edu.cn}
\renewcommand{\shortauthors}{Z. Li et al.}
\author{Anonymous authors}
\affiliation{Institutions}
\email{Emails}
\input{0-abstract}
%\begin{CCSXML}
%<ccs2012>
@ -61,11 +57,11 @@
%\ccsdesc[500]{Software and its engineering~Collaboration in software development}
\keywords{duplicate pull-request, GitHub, dataset}
\maketitle
\renewcommand{\thefootnote}{}
%\footnotetext{$^*$Corresponding author}
\footnotetext{
$^*$ Yue Yu and Zhixing Li are both first authors,
and contributed equally to the work.}
\renewcommand{\thefootnote}{\arabic{footnote}}