duppr/7_related_work.tex

\section{RELATED WORK}
\label{sec:RelatedW}


\subsection{Duplicate Detection}
Although very few work studies on duplicate detection of pull-requests,
many work investigate how to recognise duplicate bug reports.
Runeson \etal~\cite{Runeson2007Detection} is one of the first such studies.
They evaluated how NLP techniques support duplicates identification and
found NLP techniques can found 40\% marked duplicates.
Wang \etal~\cite{Wang2008} proposed an approach to detect duplicate bug reports
by comparing the natural language information and execution information between
the new report and the existing reports.
Sun \etal~\cite{Sun2010A} used discriminative models to detect duplicates and
their evaluation on three large software bug repositories showed that their method
achieved improvements compared with methods using natural language.
Later, Sun \etal~\cite{Sun2011Towards} proposed a retrieval function,
which fully utilized the information available in a bug report, to measure
the similarity beween two bug reports.
Nguyen \etal~\cite{Nguyen2012Duplicate} modeled each bug report as a textual document
and took advantage of both IR-based features and topic-based features
to learn the sets of different terms used to describe the same problems.
Thung \etal~\cite{Thung2014DupFinder} developed a tool
implementing the approach proposed by Runeson \etal~\cite{Runeson2007Detection}
and integrated it into the existing bug tracking systems.
Lazar \etal~\cite{Lazar2014Improving} made use of a set of new textual features
and trained several binary classification models to improve the detection performance.
Moreover, Zhang \etal~\cite{Zhang2015Multi} investigated to
detect duplicate questions in Stack Overflow.
They measured the similarity of two questions by comparing observable factors including titles, descriptions, and tags of the questions and latent factors corresponding to the topic distributions learned from the descriptions of the questions.


\subsection{Pull-request \& code review}
Although research on pull-requests is in its early stages,
several relevant studies have been conducted.
Gousios \etal~\cite{Gousios:2014,Rigby:2014} conducted a statistical analysis
of millions of pull-requests from GitHub and analyzed the popularity of pull-requests,
the factors affecting the decision to merge or reject a pull-request,
and the time to merge a pull-request.
Tsay \etal~\cite{Tsay:2014b} examined how social
and technical information are used to evaluate pull-requests.
Yu \etal~\cite{Yu:2016} conducted a quantitative study on pull-request evaluation in the context of CI.
Moreover, Yue \etal~\cite{Yu:2015} proposed an approach that combines information retrieval
and social network analysis to recommend potential reviewers.
Veen \etal~\cite{Veen:2015} presented PRioritizer,
a prototype pull-request prioritization tool,
which works to recommend the top pull-requests the project owner should focus on.

Code review is employed by many software projects to
examine the change made by others in source codes,
find potential defects,
and ensure software quality before they are merged~\cite{Baysal:2015,Mcintosh:***}.
Traditional code review,
which is also well known as the code inspection proposed by Fagan~\cite{Fagan:1976},
has been performed since the 1970s.
However, its cumbersome and synchronous characteristics have hampered
its universal application in practice~\cite{Bacchelli:2013}.
With the occurrence and development of VCS and collaboration tools,
Modern Code Review (MCR)~\cite{Rigby:2011} is adopted by many software companies and teams.
Different from formal code inspections,
MCR is a lightweight mechanism~\cite{t2015cq,Thongtanunam:2015}
that is less time consuming and supported by various tools.
Code review has been widely studied by from several perspectives
including automation of review task~\cite{Thongtanunam:2015,jiang:2015,rahman2016correct,Lima},
factors influencing review outcomes~\cite{Tsay:2014b,Baysal:2015,Baum:2016} and
problems found in code review~\cite{Bacchelli:2013,Beller:2014}.
The impact of code review on software quality~\cite{Mcintosh:***,Morales2015Do} is also investigated by
many studies in terms of code review coverage and code review participation~\cite{Mcintosh:2014}
and code ownership~\cite{T2016Revisiting}.
While the main motivation for code review was believed
to be finding defects to control software quality,
recent research has revealed that defect elimination is not the sole motivation.
Bacchelli \etal~\cite{Bacchelli:2013} reported additional expectations,
including knowledge transfer, increased team awareness,
and creation of alternative solutions to problems.