This commit is contained in:
starlee 2017-10-20 00:37:52 +08:00
parent 10cf715612
commit 8c63872db9
4 changed files with 39 additions and 54 deletions

View File

@ -3,8 +3,8 @@
Code reviews in pull-based model are open to community users on GitHub.
Various participants are taking part in the review discussions and
the review topics are not only limited to improvement of code contributions
but also project evolution and social interaction.
the review topics are not only about improvement of code contributions
but also about project evolution and social interaction.
A comprehensive understanding of the review topics in pull-based model
would be useful to better organize the code review process and optimize review tasks
such as reviewer recommendation and pull-request prioritization.
@ -17,7 +17,7 @@ Second, we did preliminary quantitative analysis on a large set of review commen
that were labeled by TSHC (a two-stage hybrid classification algorithm ),
which is able to automatically classify review comments by combining rule-based and machine-learning techniques.
Through the quantitative study, we explored the typical review patterns.
We found that the three projects represent similar comments distribution on each category.
We found that the three projects present similar comments distribution on each category.
Pull-requests submitted by inexperienced contributors
tend to contain potential issues even though they have passed the tests.
Furthermore, external contributors are more likely to break project conventions

View File

@ -30,24 +30,24 @@ These external developers are interested in these projects and
are concerned about the development of them.
After receiving the review comments, the contributor usually responds positively and
updates the pull-request for another round of review.
Thereafter, the responsible integrator makes a decision to accept the pull-request
or reject it by taking all judgments and changes into consideration.
Thereafter, the responsible integrator makes a decision to accept or reject the pull-request
by taking all judgments and changes into consideration.
Previous studies~\cite{Tsay:2014b,Yu:2016,t2015cq} have shown that
code review, as a well-established software quality practice,
is one of the most significant stages in software development.
It ensures that only high-quality code changes are accepted,
based on the in-depth discussion among reviewers.
based on the in-depth discussions among reviewers.
With the evolution of collaboration tools and environment~\cite{Storey2014The,Zhu2016Effectiveness},
development activities in the communities like GitHub are more transparent and social.
{\color{red}Contribution evaluation in the social coding communities is more complex
than simple acceptance or rejection~\cite{Tsay:2014a}. }
Various participants are taking part in the discussion and
what they are talking about are not only limited to improvement of code contributions
but also project evolution and social interaction.
A comprehensive understanding of the topics of code review would be useful to
better organize the code review process and optimize review tasks
Various participants are taking part in the discussions and
the discussion topics are not only about improvement of code contributions
but also about project evolution and social interaction.
A comprehensive understanding of the review topics in pull-based model would be useful to
better organize the social code review process and optimize review tasks
such as reviewer recommendation~\cite{Lima,Yu:2015}
and pull-request prioritization~\cite{Veen:2015}.
@ -57,7 +57,7 @@ Several similar studies
have been conducted on analyzing reviewers' motivation and issues raised in code reviews.
Bacchelli and Bird~\cite{Bacchelli:2013} explored the motivation, outcome and challenge in code reviews
based on an investigation of developers and projects at Microsoft.
They found that although defect finding is still the most motivation in code reviews,
They found that although defect finding is still the major motivation in code reviews,
defect related comments comprise a small proportion.
Their study focused on commercial projects and
the reviewers of these projects are mainly the employees at Microsoft.
@ -72,7 +72,7 @@ They reported that reviewers discuss on both
the appropriateness of the contribution proposal and the correctness of the implemented solution.
They only analyzed comments of highly discussed pull-requests which have extended discussions.
According to the statistics in their dataset, however,
each pull-request gets 2.6 comments in average and only about 16\% of the pull-request have extended discussions.
each pull-request gets 2.6 comments in average and only about 16\% of the pull-requests have extended discussions.
The small proportion of explored pull-requests may result in bias to the experiment results
and constrain the generalization of the their findings.
Therefore, we tried to revisit the review topics in pull-based development model
@ -116,6 +116,6 @@ The rest of this paper is organized as follows.
Section~\ref{sec:bg} provides the research background
and introduces the research questions.
In Section~\ref{sec:approach}, we elaborate the approach of our study.
Section~\ref{sec:result} lists the research result,
while Section~\ref{sec:Threats} discusses several threats to validity of our study.
Section~\ref{sec:Concl} concludes this paper and gives an outlook to future work.
Section~\ref{sec:result} lists the research results,
while Section~\ref{sec:Threats} discusses several threats to the validity of our study.
Section~\ref{sec:Concl} concludes this paper and gives an outlook to the future work.

View File

@ -36,7 +36,7 @@ Test:
Several reviewers play the role of testers
to ensure that the pull-request does not break the current runnable state.
They check the submitted changes by manually running the patches locally
or through an automated manner with the help of continuous integration (CI) services.
or through an automatic manner with the help of continuous integration (CI) services.
Review:
All developers in the community have the chance
@ -44,7 +44,7 @@ to discuss that pull-request in the issue tracker,
with the existence of pull-request description, changed files, and test result.
After receiving the feedback from reviewers,
the contributor updates his pull-request by attaching new commits
for another round review.
for another round of code review.
Decide:
A responsible manager of the core team considers all the opinions of reviewers
@ -65,14 +65,14 @@ reviewer recommendation, \etc.
%Pull-request determination
Gousios \etal~\cite{Gousios:2014,Rigby:2014} conducted a statistical analysis
of millions of pull-requests from GitHub and analyzed the popularity of pull-requests,
the factors affecting the decision to merge or reject a pull-request,
the factors affecting the decision on a pull-request,
and the time to merge a pull-request.
Their research result implies that
there is no difference between core members and outside contributors
in getting their pull-requests accepted and
only 13\% of the pull-requests are rejected for technical reasons.
Tsay \etal~\cite{Tsay:2014b} examined how social
and technical information are used to evaluate pull-requests.
and technical information is used to evaluate pull-requests.
They found that reviewers take into account of both
the contributors' technical practices
and the social connection between contributors and reviewers.
@ -82,7 +82,7 @@ and CI is a dominant factor for the process,
which even changes the effects of some traditional predictors.
Yu \etal~\cite{Lima,Yu:2015} proposed approaches to automatically
recommend potential pull-request reviewers,
which applied Random Forest algorithm and Vector Sapace Model respectively.
which applied Random Forest algorithm and Vector Space Model respectively.
Veen \etal~\cite{Veen:2015} presented PRioritizer,
a prototype pull-request prioritization tool,
which works to recognize the top pull-requests the core members should focus on.
@ -98,28 +98,27 @@ Traditional code review,
which is also well known as code inspection proposed by Fagan~\cite{Fagan:1976},
has been performed since the 1970s~\cite{Bacchelli:2013}.
The inspection process consists of well-defined steps
which are executed one-by-one in group meeting\cite{Fagan:1976}.
Many evidence~\cite{Aurum:2002} have proved the value of code inspection
which are executed one-by-one in group meetings~\cite{Fagan:1976}.
Many studies~\cite{Aurum:2002} have proven the value of code inspection
in software development.
However, its cumbersome and synchronous characteristics have hampered
its universal application in practice~\cite{Bacchelli:2013}.
On the other hand,
with the evolution of software development model,
this approach is also usually misapplied and
result in bad outcomes according to Riby's exploration\cite{Rigby:**}.
this approach is also usually misapplied according to Riby's exploration~\cite{Rigby:**}.
With the occurrence and development of version control systems
and collaboration tools,
Modern Code Review (MCR)~\cite{Rigby:2011} is adopted by many software companies and teams.
Different from formal code inspections,
modern code review (MCR)~\cite{Rigby:2011} is adopted by many software companies and teams.
Different from the formal code inspection,
MCR is a lightweight mechanism that is less time-consuming and supported by various tools.
Rigby and Storey~\cite{Rigby:2011} explored MCR process
in open source communities.
They performed case studies on GCC, Linux, Mozilla and Apache
They performed case studies on GCC, Linux, and other projects
and found several code review patterns and
a broadcast-based code review style used by Apache.
Basum \etal~\cite{Baum:2016}analyzed MCR practice
in commercial software development teams inorder to
in commercial software development teams in order to
improve the use of code reviews in industry.
While the main motivation for code review was believed
to be finding defects to control software quality,
@ -127,10 +126,7 @@ recent research~\cite{Bacchelli:2013} has revealed
additional expectations,
including knowledge transfer, increased team awareness,
and creation of alternative solutions to problems.
Moreover, participation rate in code review reported in the study
conducted by Ciolkowski \etal~\cite{Ciolkowski:2003}
shown that the figure has increased in theses years.
Other studies~\cite{Rigby:2011,Baum:2016,Mcintosh:2014} also
Moreover, other studies~\cite{Rigby:2011,Baum:2016,Mcintosh:2014} also
investigated factors that influence the outcomes of code review process.
@ -140,12 +136,12 @@ and makes it more transparent and social~\cite{Storey2014The}.
The evaluation of pull-request in GitHub is a form of MCR~\cite{Bacchelli:2013}.
The way to give voice on pull-request evaluation is to
participate in the discussion and leave comments.
Tsay \etal~\cite{Tsay:2014a} analyzed highly discussed pull-request
Tsay \etal~\cite{Tsay:2014a} analyzed highly discussed pull-requests
which have extended discussions and explored evaluating code contributions through discussion in GitHub.
Georgios \etal~\cite{Gousios:2014b} investigated the challenges faced by the integrators in GitHub
by conducting a survey which involves 749 integrators.
by conducting a survey which involved 749 integrators.
Two kinds of comments can be identified by their position in the discussion:
Two kinds of comments can be identified by their positions in the discussions:
issue comments (\ie general comments) as a whole about the overall contribution
and inline comments for specific lines of code in the changes ~\cite{Yu:2015,Thongtanunam:2016}.
Actually, there is another kind of review comments in GitHub: commit comments
@ -170,23 +166,23 @@ As can be seen from Figure~\ref{fig:comment},
all the review comments corresponding to a pull-request
are displayed and ordered primarily by creation time.
Issue comments are directly visible,
while inline comments default to be folded and is shown when the toggle button is clicked.
while inline comments default to be folded and will be shown when the toggle button is clicked.
% \emph{Participant:}
Due to the transparent environment,
a large number of external developers,
who are concerned about the development of the corresponding project,
who are concerned about the development of a open-source project,
are allowed to participate in the evaluation of any pull-request
of a public repository.
of the public repository.
All the participants can ask the submitter to bring out the pull-request more clearly
or improve the solution.
Prior study~\cite{Gousios:2014} also reveals that, in most projects,
The prior study~\cite{Gousios:2014} also reveals that, in most projects,
more than half of the participants are external developers,
while the majority of the comments come from core team members.
The diversity of participants and their different concerns
in pull-request evaluation
produce various types of review comments
which cover a wide range of motivations
produces various types of review comments
which cover a wide range of topics
from solution detail to contribution appropriateness.
\subsection{Research Questions}
@ -194,7 +190,7 @@ from solution detail to contribution appropriateness.
In this paper, we focus on analyzing the review comments in GitHub.
Although prior work~\cite{Bacchelli:2013,Gousios:2014b,Tsay:2014a}
{\color{red}
has identified several motivations and issues raised in code review,
has identified several motivations and issues raised in code reviews,
we believe that the review topics involved in pull-based development model
are not yet well understood,
especially the underlying taxonomy of review topics has not been identified

View File

@ -219,17 +219,6 @@
}
@article{Ciolkowski:2003,
title={Software Reviews: The State of the Practice},
author={Ciolkowski, Marcus and Laitenberger, Oliver and Biffl, Stefan},
journal={Software IEEE},
volume={20},
number={20},
pages={46-51},
year={2003},
}
@article{Yu:2016,
title={Determinants of pull-based development in the context of continuous integration},
author={Yu, Yue and Yin, Gang and Wang, Tao and Yang, Cheng and Wang, Huaimin},