This commit is contained in:
starlee 2017-10-15 14:14:26 +08:00
commit bff9e41d23
56 changed files with 6854 additions and 0 deletions

11
.gitignore vendored Executable file
View File

@ -0,0 +1,11 @@
*.fls
*.fdb_latexmk
*.gz
*.aux
*.pdf
*.log
*.bbl
*.blg
*(busy)
*DS_Store
~*

31
PRCCE/0_abstract.tex Executable file
View File

@ -0,0 +1,31 @@
\begin{abstract}
Code reviews in pull-based model are open to community users on GitHub.
Various participants are taking part in the review discussions and
what they are talking about are not only limited to improvement of code contributions
but also project evolution and social interaction.
A comprehensive understanding of the topics of code reviews in pull-based model
would be useful to better organize the code review process and optimize review tasks
such as reviewer recommendation and pull-request prioritization.
In this paper, we first conducted a qualitative study on
three popular open-source software projects hosted on GitHub
and constructed a fine-grained two-level taxonomy covering
4 Level-1 categories (Code Correctness, PR Decision-making, Project Management, and Social Interaction)
and 11 Level-2 sub-categories (\eg defect detecting, reviewer assigning, contribution encouraging).
Second, we did preliminary quantitative analysis on a large set of review comments
that were labeled by TSHC, a two-stage hybrid classification algorithm
which is able to automatically classify review comments by combining rule-based and machine-learning techniques.
Through the quantitative study, we explored the typical review patterns.
We found that the three projects represent similar comments distribution on each category.
Pull-requests submitted by inexperienced contributors
tend to contain potential issues even though they have passed the tests.
Furthermore, external contributors are more likely to break project conventions
in their early contributions.
\end{abstract}

36
PRCCE/10_authors.tex Normal file
View File

@ -0,0 +1,36 @@
\biography{resources/authors/1-ZhixingLi}%No empty line here
{\bf Zhixing Li}
is a Master student in the College of Computer at National University of Defense Technology, Changsha.
His work interests include open source software engineering, data mining, and knowledge discovering in open source software.
\biography{resources/authors/2-YueYu}%No empty line here
{\bf Yue Yu}
is an associate professor in the College of Computer at National University of Defense Technology, Changsha.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2016.
He has visited UC Davis supported by CSC scholarship.
His research findings have been published on MSR, FSE, IST, and ICSME.
His current research interests include software engineering, spanning from mining software repositories
and analyzing social coding networks.
\biography{resources/authors/3-GangYin}%No empty line here
{\bf Gang Yin}
is an associate professor in the College of Computer at National University of Defense Technology, Changsha.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2006.
He has worked in several grand research projects including National 973, 863 projects.
He has published more than 60 research papers in international conferences and journals.
His current research interests include distributed computing, information security, software engineering, and machine learning.
\biography{resources/authors/4-TaoWang}%No empty line here
{\bf Tao Wang}
is an associate professor in the College of Computer at National University of Defense Technology, Changsha.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2015.
His work interests include open source software engineering, machinelearning, datamining, and knowledge discovering in open source software.
\biography{resources/authors/5-HuaiminWang}%No empty line here
{\bf Huaimin Wang}
is a professor in the College of Computer at National University of Defense Technology, Changsha.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 1992.
He has been awarded the ``Chang Jiang Scholars Program'' professor and the ``Distinct Young Scholar'', etc.
He has published more than 100 research papers in peer-reviewed international conferences and journals.
His current research interests include middleware, software agent, and trustworthy computing.

129
PRCCE/1_introduction.tex Executable file
View File

@ -0,0 +1,129 @@
\section{Introduction}
%Pull request
The pull-based development model is becoming increasingly popular in distributed collaboration
for open-source software (OSS) development~\cite{Barr:2012,Gousios:2014,Gousios:2014b,gousios2016work}.
On GitHub~\footnote{\url{https://github.com/}, October 2017.} alone,
the largest social-coding community, nearly half of collaborative projects
(already over 1 million~\cite{gousios2016work} in January 2016) have adopted this model.
In pull-based development,
any contributor can freely fork (\ie clone) an interesting public project
and modify the forked repository locally (\eg fixing bugs and adding new features)
without asking for the access to the central repository.
When the changes are ready to merge back to the master branch,
the contributors submit a pull-request
and then a rigorous code review process is performed
before the pull-request get accepted.
%Review
Code review is a communication channel where integrators,
who are core members of a project, can express their concern
for the contribution~\cite{Tsay:2014a,Gousios:2014b,Marlow:2013,yu2015wait}.
If they doubt the quality of a submitted pull-request,
integrators make comments that
ask the contributors to improve the implementation.
With pull-requests becoming increasingly popular,
most large OSS projects allow for crowd sourcing of pull-request reviews
to a large number of external developers~\cite{Yu:2015,yu2014reviewer}
to reduce the workload of integrators.
These external developers are interested in these projects and
are concerned about the development of them.
After receiving the review comments, the contributor usually responds positively and
updates the pull-request for another round of review.
Thereafter, the responsible integrator makes a decision to accept the pull-request
or reject it by taking all judgments and changes into consideration.
Previous studies~\cite{Tsay:2014b,Yu:2016,yu2015wait,t2015cq} have shown that
code review, as a well-established software quality practice,
is one of the most significant stages in software development.
It ensures that only high-quality code changes are accepted,
based on the in-depth discussion among reviewers.
With the evolution of collaboration tools and environment~\cite{Storey2014The,Zhu2016Effectiveness},
development activities in the communities like GitHub are more transparent and social.
{\color{red}Contribution evaluation in the social coding communities is more complex
than simple acceptance or rejection~\cite{Tsay:2014a}. }
Various participants are taking part in the discussion and
what they are talking about are not only limited to improvement of code contributions
but also project evolution and social interaction.
A comprehensive understanding of the topics of code review would be useful to
better organize the code review process and optimize review tasks
such as reviewer recommendation~\cite{Yu:2015,Thongtanunam:2015,jiang:2015,rahman2016correct}
and pull-request prioritization~\cite{Veen:2015}.
{\color{red}
Several similar studies
have been conducted on analyzing reviewers' motivation and issues raised in code reviews.
Bacchelli \etal~\cite{Bacchelli:2013} explored the motivation, outcome and challenge in code reviews
based on an investigation of developers and projects at Microsoft.
They found that although defect finding is still the most motivation in code reviews,
defect related comments comprise a small proportion.
Their study focused on commercial projects and
the reviewers of these projects are mainly the employees at Microsoft.
However, projects using pull-based model in GitHub are developed in a more transparent and open environment
and code reviews are executed by the community users.
Community reviewers may serve for different organizations, use the project for various purposes,
and hold specific consideration in code review processes.
Therefore different review topics may exist in pull-based development model
compared to those in commercial development model.
Tsay \etal~\cite{Tsay:2014a} explored issues raised around code contributions in GitHub.
They reported that reviewers discuss on both
the appropriateness of the contribution proposal and the correctness of the implemented solution.
They only analyzed comments of highly discussed pull-requests which have extended discussions.
According to the statistics in their dataset, however,
each pull-request gets 2.6 comments in average and only about 16\% of the pull-request have extended discussions.
The small proportion of explored pull-requests may result in bias to the experiment results
and constrain the generalization of the their findings.
Therefore, we tried to revisit the review topics in pull-based development model
and conduct qualitative and quantitative analysis
based on a large-scale dataset of review comments.
}
In this paper, we first conducted a qualitative study on three popular
open-source software projects hosted on GitHub
and constructed a fine-grained taxonomy covering 11 categories
(\eg defect detecting, reviewer assigning, contribution encouraging)
for the review comments generated in the discussions.
{\color{red}
Second, we did preliminary quantitative analysis on a large set of review comments
that are labeled by \TSHC, a two-stage hybrid classification algorithm
which is able to automatically classify review comments
by combining rule-based and machine-learning (ML) techniques.
}
The main contributions of this study include the following:
\begin{itemize}
\item A fine-grained and multilevel taxonomy for review comments
in the pull-based development model is provided
in relation to technical, management, and social aspects.
\item A high-quality manually labeled dataset of review comments,
which contains more than 5,600 items, can be accessed via
a web page~\footnote{\url{https://www.trustie.net/projects/2455}, October 2017.}
and used in further studies.
\item A high-performance automatic classification model for
review comments of pull-requests is proposed.
The model leads to a significant improvement in terms of the weighted average F-measure,
\ie 9.2\% in Rails, 5.3\% in Elasticsearch, and 7.2\% in Angular.js,
compared with the text-based method.
\item Typical review patterns are explored which may provide implications
for reviewers and collaborative development platforms
to better organize the code review process.
\end{itemize}
The rest of this paper is organized as follows.
Section~\ref{sec:bg} provides the research background
and introduces the research questions.
In Section~\ref{sec:approach}, we elaborate the approach of our study.
Section~\ref{sec:result} lists the research result,
while Section~\ref{sec:Threats} discusses several threats to validity of our study.
Section~\ref{sec:Concl} concludes this paper and gives an outlook to future work.
% Several studies have been conducted to explore how the code review process influences
% pull-request acceptance~\cite{Tsay:2014b,Yu:2016} and latency~\cite{yu2015wait},
% and software release quality~\cite{t2015cq}.
% They found that code review, as a well-established software quality practice,
% is one of the most significant stages in pull-based development.

227
PRCCE/2_background.tex Executable file
View File

@ -0,0 +1,227 @@
\section{Background and Related Work}
\label{sec:bg}
\subsection{Pull-based development model}
In GitHub, a growing number of developers contribute to the open source projects
by using the pull-request mechanism~\cite{Gousios:2014,Yu:2014}.
As illustrated in Figure~\ref{fig:github_wf},
a typical contribution process based on pull-based development model in GitHub
involves the following steps.
Fork:
A contributor can find an interesting project
by following several well-known developers and watching their projects.
Before contributing, the contributor has to fork the original project.
Edit:
After forking, the contributor can edit locally
without disturbing the main branch in original repository.
He is free to do whatever he wants,
such as implementing a new feature
or fixing bugs according to the cloned repository.
Pull Request:
When his work is finished,
the contributor submits the changed codes from the forked repository
to its source by a pull-request.
Except for commits, the submitter needs to provide a title and description
to elaborate on the objective of the pull-request.
Test:
Several reviewers play the role of testers
to ensure that the pull-request does not break the current runnable state.
They check the submitted changes by manually running the patches locally
or through an automated manner with the help of continuous integration (CI) services.
Review:
All developers in the community have the chance
to discuss that pull-request in the issue tracker,
with the existence of pull-request description, changed files, and test result.
After receiving the feedback from reviewers,
the contributor updates his pull-request by attaching new commits
for another round review.
Decide:
A responsible manager of the core team considers all the opinions of reviewers
and merges or rejects the pull-request.
\begin{figure}[tbp]
\centering
\includegraphics[width=8cm]{resources/Fig-1.pdf}
\caption{Pull-based work-flow on GitHub}
\label{fig:github_wf}
\end{figure}
Although research on pull-requests is in its early stage,
several relevant studies have been conducted
in terms of exploratory analysis, priority determination,
reviewer recommendation, \etc.
%Pull-request determination
Gousios \etal~\cite{Gousios:2014,Rigby:2014} conducted a statistical analysis
of millions of pull-requests from GitHub and analyzed the popularity of pull-requests,
the factors affecting the decision to merge or reject a pull-request,
and the time to merge a pull-request.
Their research result implies that
there is no difference between core members and outside contributors
in getting their pull-requests accepted and
only 13\% of the pull-requests are rejected for technical reasons.
Tsay \etal~\cite{Tsay:2014b} examined how social
and technical information are used to evaluate pull-requests.
They found that reviewers take into account of both
the contributors' technical practices
and the social connection between contributors and reviewers.
Yu \etal~\cite{Vasilescu:B,Yu:2016} conducted a quantitative study,
and discovered that latency is a complex issue to explain adequately,
and CI is a dominant factor for the process,
which even changes the effects of some traditional predictors.
Lima \etal~\cite{Lima,Yu:2015,Thongtanunam:2015,jiang:2015,rahman2016correct} proposed approaches to automatically
recommend potential pull-request reviewers,
which applied Random Forest algorithm and Vector Sapace Model respectively.
Veen \etal~\cite{Veen:2015} presented PRioritizer,
a prototype pull-request prioritization tool,
which works to recognize the top pull-requests the core members should focus on.
\subsection{Code review}
Code review is employed by many software projects to
examine the changes made by others in source codes,
find potential defects,
and ensure software quality before they merge back~\cite{Baysal:2015,Mcintosh:***}.
Traditional code review,
which is also well known as code inspection proposed by Fagan~\cite{Fagan:1976},
has been performed since the 1970s~\cite{Ackerman:1989,Kollanus:2009,Bacchelli:2013}.
The inspection process consists of well-defined steps
which are executed one-by-one in group meeting\cite{Fagan:1976}.
Many evidence have proved the value of code inspection
in software development\cite{Frank:1984,Ackerman:1989,Russell:1991,Aurum:2002}.
However, its cumbersome and synchronous characteristics have hampered
its universal application in practice~\cite{Bacchelli:2013}.
On the other hand,
with the evolution of software development model,
this approach is also usually misapplied and
result in bad outcomes according to Riby's exploration\cite{Rigby:**}.
With the occurrence and development of version control systems
and collaboration tools,
Modern Code Review (MCR)~\cite{Rigby:2011} is adopted by many software companies and teams.
Different from formal code inspections,
MCR is a lightweight mechanism that is less time-consuming and supported by various tools.
Rigby \etal~\cite{Rigby:2006,Rigby:2008,Rigby:2011} explored MCR process
in open source communities.
They performed case studies on GCC, Linux, Mozilla and Apache
and found several code review patterns and
a broadcast-based code review style used by Apache.
Basum \etal~\cite{Baum:2016} and Bacchelli \etal~\cite{Bacchelli:2013} analyzed MCR practice
in commercial software development teams inorder to
improve the use of code reviews in industry.
While the main motivation for code review was believed
to be finding defects to control software quality,
recent research~\cite{Bacchelli:2013} has revealed
additional expectations,
including knowledge transfer, increased team awareness,
and creation of alternative solutions to problems.
Moreover, participation rate in code review reported in the study
conducted by Ciolkowski \etal~\cite{Ciolkowski:2003}
shown that the figure has increased in theses years.
Other studies~\cite{Baysal:2015,Rigby:2011,Baysal:2013,Baum:2016,Mcintosh:2014} also
investigated factors that influence the outcomes of code review process.
In recent years, collaboration tools have evolved with social media~\cite{Storey2014The,Zhu2016Effectiveness}.
Especially, GitHub integrates the function of code review in the pull-based model
and makes it more transparent and social~\cite{Storey2014The}.
The evaluation of pull-request in GitHub is a form of MCR~\cite{Beller:2014,Bacchelli:2013}.
The way to give voice on pull-request evaluation is to
participate in the discussion and leave comments.
Tsay \etal~\cite{Tsay:2014a} analyzed highly discussed pull-request
which have extended discussions and explored evaluating code contributions through discussion in GitHub.
Georgios \etal~\cite{Gousios:2014b} investigated the challenges faced by the integrators in GitHub
by conducting a survey which involves 749 integrators.
Two kinds of comments can be identified by their position in the discussion:
issue comments (\ie general comments) as a whole about the overall contribution
and inline comments for specific lines of code in the changes ~\cite{Yu:2015,Thongtanunam:2016}.
Actually, there is another kind of review comments in GitHub: commit comments
which are commented on the commits of pull-requests~\cite{zhang2017social}.
Among these three kinds of comments,
general comments and inline comments account for the vast majority of all the comments
and the number of commit comments is extremely small.
Consequently, we only focus on general comments and inline comments
and do not take the commit comments into consideration.
\begin{figure}[ht]
% https://github.com/rails/rails/pull/12150
\centering
\includegraphics[width=8cm]{resources/Fig-2.pdf}
\caption{Example comments on GitHub}
\label{fig:comment}
\end{figure}
% \emph{Visualization:}
As can be seen from Figure~\ref{fig:comment},
all the review comments corresponding to a pull-request
are displayed and ordered primarily by creation time.
Issue comments are directly visible,
while inline comments default to be folded and is shown when the toggle button is clicked.
% \emph{Participant:}
Due to the transparent environment,
a large number of external developers,
who are concerned about the development of the corresponding project,
are allowed to participate in the evaluation of any pull-request
of a public repository.
All the participants can ask the submitter to bring out the pull-request more clearly
or improve the solution.
Prior study~\cite{Gousios:2014} also reveals that, in most projects,
more than half of the participants are external developers,
while the majority of the comments come from core team members.
The diversity of participants and their different concerns
in pull-request evaluation
produce various types of review comments
which cover a wide range of motivations
from solution detail to contribution appropriateness.
\subsection{Research Questions}
In this paper, we focus on analyzing the review comments in GitHub.
Although prior work~\cite{Bacchelli:2013,Gousios:2014b,Tsay:2014a}
{\color{red}
has identified several motivations and issues raised in code review,
we believe that the review topics involved in pull-based development model
are not yet well understood,
especially the underlying taxonomy of review topics has not been identified
}
Consequently, Our first question is:
\textbf{RQ1.} What is the taxonomy for review comments on pull-requests?
{\color{red}
Furthermore, we would like to quantitatively analyze review topics
and obtain more solid knowledge.
Therefore, we need a large scale of labeled review comments to support the quantitative analysis
which leads to the second question:
}
\textbf{RQ2.} Is it possible to automatically classify review comments
according to the defined taxonomy?
{\color{red}
With a large set of labeled comments being available,
we try to conduct quantitative analysis and explore the typical review patterns
such as }
what the most comments are talking about
and how contributors raise various kinds of issues in pull-requests.
Therefore, our last research question is:
\textbf{RQ3.} What are the typical review patterns in pull-based model?

410
PRCCE/3_approach.tex Executable file
View File

@ -0,0 +1,410 @@
\section{Approach}
\label{sec:approach}
\subsection{Approach Overview}
% \hl{The goals of our work are to }
% build a taxonomy for review comments in the pull-based development model
% and automate the comment classification according to the defined taxonomy.
% % !!!!!!!!!!!!!!!!!改成motivation
% We aim to provide the foundation for further study on
% the optimization of socialized code review process.
The goal of our work is to
investigate the code review practice in GitHub.
we aim to
{\color{red}
qualitatively and quantitatively analyze review topics in pull-based development model.
}
As illustrated in Figure \ref{fig:approach},
our research approach consists of the following steps.
\begin{figure}[ht]
\centering
\includegraphics[width=8cm]{resources/Fig-3.pdf}
\caption{The overview of our approach}
\label{fig:approach}
\end{figure}
\begin{enumerate}[1)]
\item Data collection:
In our prior work,
we collected 4,896 projects that have the most number of pull-requests on Github.
This dataset is crawled through the official API offered by GitHub
and updated in the current study.
In addition, we also use the data released by
GHTorrent\footnote{\url{http://ghtorrent.org}, October 2017.}.
\item Taxonomy definition:
We constructed a fine-grained multilevel taxonomy for the review comments
in pull-based development model.
\item Automatic classification:
We proposed the TSHC algorithm,
which automatically classifies review comments
using rule-based and ML techniques.
\item Quantitative analysis:
We did a preliminary quantitative analysis on a large set of
review comments which were labeled by TSHC algorithm.
\end{enumerate}
\subsection{Dataset}
\begin{table*}[ht]
\scriptsize
\centering
\caption{Dataset of our experiments}
\begin{tabular}{r c c r c c c c c}
% \toprule
\hline
\rowcolor[HTML]{000000}
{\color[HTML]{FFFFFF}\textbf{Projects}} &
{\color[HTML]{FFFFFF}\textbf{Language}} &
{\color[HTML]{FFFFFF}\textbf{Application Area}} &
{\color[HTML]{FFFFFF}\textbf{Hosted\_at}} &
{\color[HTML]{FFFFFF}\textbf{\#Star}} &
{\color[HTML]{FFFFFF}\textbf{\#Fork}} &
{\color[HTML]{FFFFFF}\textbf{\#Contr}} &
{\color[HTML]{FFFFFF}\textbf{\#PR}} &
{\color[HTML]{FFFFFF}\textbf{\#Comnt}} \\
% \midrule
\hline
\textbf{Rails} & Ruby & Web Framework & May. 20 2009 & 33906 & 13789 & 3194 & 14648 & 75102 \\
\textbf{Elasticsearch } & Java & Search Server & Feb. 8 2010 & 20008 & 6871 & 753 &6315 & 38930 \\
\textbf{Angular.js} & JavaScript & Front-end Framework & Jan. 6 2010 & 54231 & 26930 & 1557 & 6376 & 33335 \\
\bottomrule
\multicolumn{9}{l}{\emph{Abbreviations}: Contr: Contributor,
PR: Pull-request, Comnt: Comment}
\end{tabular}
\label{tab:dataset}
\end{table*}
The experiments in the paper is conducted on
three representative open source projects,
namely Rails, Elasticsearch and Angular.js,
which made heavy use of pull-requests in GitHub.
The dataset is composed of two sources:
{\color{red}GHTorrent MySQL dump} released in Jun. 2016
and our own crawled data from GitHub.
From GHTorrent, we can get a list of projects together with their basic information
such as programming language, hosting time, the number of forks, and the list of pull-requests.
{\color{red}
For our own dataset, we have crawled the text content of
pull-requests (\ie title, description) and review comments
according to the urls provided by GHTorrent.
}
Finally, the two sources are linked by the id of projects and pull-request number.
Table \ref{tab:dataset} lists the key statistics of the dataset.
We studied on original projects (\ie not forked from others)
written in Ruby, Java and JavaScript,
which are three of the most popular languages\footnote{\url{https://github.com/blog/2047-language-trends-on-github}, October 2017.} on GitHub.
The three selected projects are hosted in GitHub at an early time,
and are applied in different areas;
Rails is used to build websites,
Elasticsearch acts as a search server
and Angular.js is an outstanding front-end development framework.
This ensures the diversity of experimental projects,
which is needed to increase the generalizability of our work.
The number of stars, the number of forks and the number of contributors
indicate the hotness of a project.
Starring is similar to the function of a bookmarking system
which informs users of the latest activities of projects that they have starred.
In total, our dataset contains 27,339 pull-requests and 147,367 review comments.
\subsection{Taxonomy Definition}
\label{td}
Previous work has studied the challenges faced by pull-request reviewers
and the issues introduced by pull-request submitters~\cite{Gousios:2014b,Tsay:2014a}.
Inspired by their work, we decide to comprehensively observe
the topics of code reviews in depth
rather than merely focusing on technical and nontechnical perspectives.
% !!!!!!!!!! card sort
We conducted a card sort~\cite{Bacchelli:2013}
to determine the taxonomy schema,
which is executed manually through an iterative process
of reading and analyzing review comments randomly collected from the three projects.
The following steps are to executed to define the taxonomy.
\begin{enumerate}[1)]
\item First, we randomly selected 900 review comments (300 for each project).
\item Two participants independently conducted card sorts on the 70\% of the comments.
They firstly labeled each selected comment with a descriptive message.
The labeled comments are then divided into different groups
according to their descriptive messages.
Through a rigorous analysis of existing literature
and their own experience with working and analyzing
the pull-based model in the last two years,
they identified their respective draft taxonomies.
\item They then met and discussed their identified draft taxonomies.
When they agreed with each other, the initial taxonomy was constructed.
\item Another 10 participants were invited to help varify and improve the taxonomy
by examining another 10\% of the comments.
After some refinement of category definition and adjustment of taxonomy hierarchy was executed,
the final taxonomy was determined.
\item Finally, the two authors independently classified the remaining 20\% of the comments into the taxonomy.
We used one of the most popular reliability measurement, that is percent agreement, to calculate their reliability.
We found they agreed on the 97.8\% (176 / 180) of the labeling of comments.
\end{enumerate}
In the above process, the two main participants (\ie the two authors of the paper)
have five years' and eight years' experience in software development respectively.
Both of them are interested in academic research in empirical software engineering and social coding.
Especially, the second author has four years' research experience in pull-based development model.
The another 10 participants also work at our research team
including postgraduates, PH. D. candidates and project developers.
\subsection{Automatic Classification}
{\color{red}
After the taxonomy of review topics is defined,
we would like to conduct quantitative study on a large scale of labeled review comments
which are expected to be classified by an automatic approach.
First of all, we need to manually label a set of comments to train the automatic classification model.
And then, we use the classification model to automatically label a larger dataset of comments.
}
\subsubsection{Manual labeling}
For each project, we randomly sampled 200 pull-requests
( of which the comment count is greater than 0)
per year from 2013 to 2015.
Overall, 1,800 distinct pull-requests and 5,645 review comments are sampled.
According to the defined taxonomy,
we manually classified the sampled review comments.
We built an online labeling platform (OLP) which
was deployed on the public network and offered a web-based interface.
OLP is helpful to reduce the extra burden on the labeling executor
and ensure the quality of the manual labeling results.
\begin{figure}[ht]
\centering
\includegraphics[width=8cm]{resources/Fig-4.pdf}
\caption{The main interface of OLP}
\label{fig:olp}
\end{figure}
As shown in Figure~\ref{fig:olp},
the main interface of OLP contains three sections
which presents a pull-request together with its review comments,
Section 1:
The title, description, and submitter's user name of a pull-request
are displayed in this section.
The hyper-link of the pull-request on GitHub is also shown to
make it more convenient to jump to the original web page
for a more detailed view and inspection if necessary.
Section 2:
All the review comments of a pull-request are listed in this section
and ordered by creation time.
The user name,
comment type (inline comment or issue comment),
and creation time of a review comment appear on the top of comment content.
Section 3:
Beside a comment,
there are candidate labels corresponding to our taxonomy,
which can be selected by clicking the check-box next to the label.
The use of check-boxes means that multiple labels can be assigned to a comment.
Other labels that are not in our list can also be reported
by writing free text in the text field named `other'.
\subsubsection{Two-stage hybrid classification}
Figure~\ref{fig:TSHC} illustrates the automatic classification model \TSHC (Two-Stage Hybrid Classification).
TSHC consists of two stages that utilize comments text
and other information extracted from comments and pull-requests respectively.
\begin{figure*}[ht]
\centering
\includegraphics[width=18cm]{resources/Fig-5.pdf}
\caption{Overview of \TSHC}
\label{fig:TSHC}
\end{figure*}
\underline{\textbf{Stage One:} }
The classification in this stage mainly utilizes the text part of each review comment
and produces a vector of possibility (VP),
in which each item is the possibility of whether
a review comment will be labeled with the corresponding category.
Preprocessing is necessary before formal comment classification~\cite{Antoniol:2008}.
Reviewers tend to reference the source code, hyper-link, or statement of others
in a review comment to clearly express their opinion, prove their point, or reply to other people.
These behaviors promote the review process
but cause a great challenge to comment classification.
Words in these reference texts contribute minimally to the classification
and even introduce interference.
Hence, we transform them into single-word indicators to
reduce vocabulary interference
and reserve reference information.
Specifically, the source code, hyperlink, and statement of others are replaced with
\textit{``cmmcode''}, \textit{``cmmlink''} and \textit{``cmmtalk''} respectively.
After preprocessing, a review comment is classified by a rule-based technique,
which uses inspection rules to match the comment text for each category.
Several phrases often appear in the comments of a specific category.
For instance, \textit{``lgtm''} (abbreviation of \textit{``looks good to me''})
is usually used by reviewers to express their satisfaction with a pull-request.
This type of phrases are discriminating and helpful
in recognizing the category of a review comment.
Therefore, we establish inspection rules for each category,
which are a set of regular expressions abstracted from discriminating phrases.
A category label is assigned to a review comment,
and the corresponding item in VP is set to 1
if one of its inspection rules matches the comment text.
In our method each category gets 7.5 rules in average,
therefore the following only shows part of the inspection rules
and the corresponding matched comments.
\begin{itemize}
\item From Style Checking (L2-1)
\begin{itemize}
\item \footnotesize{\texttt{(blank$|$extra) (line$|$space)s?}}
\item \textit{``Please add a new \textbf{blank line} after the include''}
\end{itemize}
\item From Value Affirming (L2-4)
\begin{itemize}
% \item \small{\texttt{(looks$|$seem)s? (good$|$great$|$useful$|$awesome$|$fine)}}
\item \footnotesize{\texttt{(looks$|$seem)s? (good$|$great$|$useful)}}
\item \textit{``\textbf{Looks good} to me, the grammar is definitely better.''}
\end{itemize}
\item From Reviewer Assigning (L2-8)
\begin{itemize}
\item \footnotesize{\texttt{(cc:?$|$wdyt$|$defer to$|\backslash$br$\backslash$?) (@$\backslash$w+ *)+}}
\item \textit{``\textbf{/cc @fxn} can you take a look please?''}
\end{itemize}
\item From Politely Responding (L2-10)
\begin{itemize}
\item \footnotesize{\texttt{thanks?|thxs?|:($\backslash$w+)?heart:}}
% \item $thanks?|thxs?|:(\backslash w+)?heart:$
\item \textit{``@xxx looks good. \textbf{Thank} you for your contribution \textbf{:yellow\_heart:}''}
\end{itemize}
\end{itemize}
The review comment is then processed by the ML-based technique.
ML-based classification is performed with scikit-learn\footnote{\url{http://scikit-learn.org/}, October 2017.},
particularly the support vector machine (SVM) algorithm.
The comment text is tokenized and stemmed to a root form~\cite{Porter:1997}.
We filter out punctuations from word tokens and reserve English stop words
because we assume that common words play an important role in short texts, such as review comments.
We adopt the TF-IDF (term frequency-inverse document frequency) model~\cite{Baeza:2004}
to extract a set of features from the comment text
and apply the ML algorithm to text classification.
A single review comment often addresses multiple topics.
Hence, one of the goals of \TSHC is to perform multi-label classification.
To this end, we construct text classifiers (TCs) for each category
with a one-versus-all strategy.
For a review comment
which has been matched by inspection rule $R_{i}$
(supposing that n categories of $C_{1},C_{2}, \dots, C_{n}$ exist),
each TC ($TC_{1},TC_{2}, \dots, TC_{n}$), except for $TC_{i}$,
is applied and predict the possibility of this review comment belonging to the corresponding category
Finally, the VP determined by inspection rules and text classifiers are passed on to the second stage.
\underline{\textbf{Stage Two:}}
The classification in this stage is performed on composed features.
Review comments are usually short texts.
Our statistics indicates that the minimum, maximum, and average numbers of words
contained in a review comment
are 1, 1527, and 32, respectively.
The text information contained in a review comment is limited to be used to determine its category.
Therefore, in addition to the VP generated in stage one,
we also consider the following other features related to review comments.
Comment\_length:
This feature refers to the total number of characters
contained in a comment text after preprocessing.
Long comments are likely to argue about pull-request appropriateness and code correctness.
Comment\_type:
This binary feature indicates whether a review comment is inline comment or issue comment.
An inline comment tends to talk about the solution detail,
whereas an issue comment is likely to talk about other ``high-level'' issues,
such as pull-request decision and project management.
Core\_team:
This binary feature refers to whether the comment author
is a core member of a project or an external contributor.
Core members are more likely to pay attention to
pull-request appropriateness and project management.
Link\_inclusion:
This binary feature identifies if a comment includes hyper-links.
Hyper-links are usually used to provide evidence
when someone insists on a point of view or
to offer guidelines when someone wants to help other people.
Ping\_inclusion:
This binary feature refers to if a comment includes ping activity
(occurring by the form of ``@ user-name'').
Code\_inclusion:
This binary feature denotes if a comment includes the code elements.
Comments related to the solution detail tend to contain code elements.
However, this assumption does not limit whether the code is a replica of the committed code
or a new code snippet wrote by the reviewer
Ref\_inclusion:
This binary feature indicates if a comment includes a reference on the statement of others.
Such a reference indicates a reply to someone,
which probably reflects further suggestion to or disagreement with a person.
Sim\_pr\_title:
This feature refers to the similarity between the text of the comment
and the text of the pull-request title
(measured by the number of common words divided by the number of union words).
Sim\_pr\_desc:
This binary feature denotes the similarity between the text of the comment
and the text of the pull request description
(measured similarly as how Sim\_pr\_title is computed).
Comments with high similarity to the title or description of a pull-request
are likely to discuss the solution detail or the value of the pull request.
Together with the VP passed from stage one,
these features are composed to form a new feature vector to be processed by prediction models.
Similar to stage one, stage two provides binary prediction models for each category.
In the prediction models, a new VP is generated
to represent how likely a comment will fall into a specific category.
After iterating the VP, a comment is labeled with class $C_{i}$
if the $i_{th}$ vector item is greater than 0.5.
If all the items of the VP are less than 0.5,
the class label corresponding to the largest possibility will be assigned to the comment.
Finally, each comment processed by \TSHC is marked with at least one class label.
\subsection{Quantitative Analysis}
To identify typical review patterns we used our trained hybrid classification model
to classify a large scale of review comments and then examined the review patterns
based on this automatically classified dataset.
In total, we classified 147,367 comments of 27,339 pull-requests.
Based on this dataset, we conducted a preliminary quantitative study.
We first explored the distribution of review comments on each category
and then we reported some of the findings derived from the distribution.

781
PRCCE/4_result.tex Executable file
View File

@ -0,0 +1,781 @@
\section{Results}
\label{sec:result}
\subsection{RQ1: What is the taxonomy for review comments on pull-requests?}
\begin{table*}[ht]
\scriptsize
\caption{Complete taxonomy}
\begin{tabular}{r r p{9.8cm}}
% \toprule
\hline
\rowcolor[HTML]{000000}
&&\\
\rowcolor[HTML]{000000}
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-1 Categories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-2 Subcategories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Description \& Example}}}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{Code Improvement}\\(L1-1)}}
&\cellcolor{lightgray} &\cellcolor{lightgray}Points out extra blank line, improper indention, inconsistent naming convention, etc. \\
&\cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Style Checking\\(L2-1)}} &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
&\multirow{3}*{\tabincell{r}{Defect Detecting\\(L2-2)}} & Figures out runtime program errors or evolvability defects and etc.\\
& &e.g., \emph{``''he default should be `false`''} and \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Demands submitter to provide test case for changed codes, report test result, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Code Testing\\(L2-3)}} &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{PR Decision-making}\\(L1-2)}}
&\multirow{2}*{\tabincell{r}{Value Affirming\\(L2-4)}} &Satisfied with the pull-request and agree to merge it\\
& &e.g., \emph{``PR looks good to me. Can you \dots''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Rejects to merge the pull-request for duplicate proposal, undesired feature, etc.\\
& \cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Solution Disagreeing\\(L2-5)}} & \cellcolor{lightgray} e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
&\multirow{3}*{\tabincell{r}{Further Questioning\\(L2-6)}} & Confused with the purpose of the pull-request and ask for more details or use cases\\
& &e.g., \emph{``Can you provide a use case for this change?''}\\
%%%%%%
% \midrule
\hline
\multirow{6}*{\tabincell{r}{\textbf{Project Management}\\(L1-3)}}
&\cellcolor{lightgray} &\cellcolor{lightgray} States what type of changes a specific version is expected to merge, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Roadmap Managing\\(L2-7)}} &\cellcolor{lightgray}e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
&\multirow{2}*{\tabincell{r}{Reviewer Assigning\\(L2-8)}} &Ping other one(s) to review this pull-request\\
& &e.g., \emph{``/cc @fxn can you take a look please?''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Needs to squash or rebase the commits, formulate the message, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Convention Checking\\(L2-9)}} &\cellcolor{lightgray}e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
%%%%%%
% \midrule
\hline
\multirow{4}*{\tabincell{r}{\textbf{Social Interaction}\\(L1-4)}}
&\multirow{3}*{\tabincell{r}{Politely Responding\\(L2-10)}} & Thanks for what other people do, apologize for mistakes, etc.\\
& &e.g.,\emph{``Thank you. This feature was already proposed and it was rejected. See \#xxx''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray} Agrees with others' opinion, compliments others' work, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Contribution Encouraging\\(L2-11)}} &\cellcolor{lightgray}e.g.,\emph{``:+1: nice one @cristianbica.''}\\
%%%%%%
% \midrule
\hline
\textbf{Others} &\multicolumn{2}{l}{Short sentence without clear or exact meaning like \textit{``@xxxx will do''} and \textit{``The same here :)''}} \\
\bottomrule
\multicolumn{3}{l}{\emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji in GitHub}}
\end{tabular}
\label{tab:taxonomy}
\end{table*}
Table \ref{tab:taxonomy} shows the complete taxonomy.
We identified four Level-1 categories,
namely Code Correctness (L1-1),
PR Decision-making (L1-2),
Project Management (L1-3),
and Social Interaction (L1-4),
each of which contains more specific Level-2 categories.
For each Level-2 category, we present its description
together with an example comment.
We show short comments or only the key part of long comments
to provide a brief display.
{\color{red}
Compared with previous studies~\cite{Bacchelli:2013,Tsay:2014a},
our hierarchical taxonomy presents a more systematic and elaborate schema
of review topics in pull-based model.
Moreover, we succeeded in identifying a group of review comments relating to project management
(\eg Roadmap Managing (L2-7), Reviewer Assigning (L2-8))
and found them proper positions in the final taxonomy.
}
Also, we would like to point out it is a common phenomenon
that a single comment usually covers multiple topics.
The following review comments are provided as examples.
\textbf{Example\_1: }
\textit{
``Thanks for your contribution! This looks good to me
but maybe you could split your patch in two different commits? One for each issue.''
}
\textbf{Example\_2: }
\textit{
``Looks good to me :+1: /cc @steveklabnik @fxn''
}
In the first comment \textit{C1},
the reviewer thanks the contributor (\L2-10)
and shows his satisfaction with this pull request (L2-4),
followed by a request of commit splitting (L2-9).
In \textit{C2},
the reviewer expresses that he agrees with this change (L2-4),
thinks highly of this work
(L2-11, \textit{:+1:} is markdown grammar for emoji ``thumb-up''),
and assigns other reviewers to ask for more advice (L2-11).
With regard to reviewer assignment, reviewers tend to delegate a code review
if they are unfamiliar with the change under review.
In addition, we have collected 13 exception comments
which are labled with \textit{other} option.
These comments mainly fall into the following two groups:
\begin{itemize}
\item Platform-related (2): this kind of comments are related to the platform, i.e., GitHub.
Sometimes, developers may be not familiar with the features of GitHub:
``I don't understand why Github display all my commits''
\item Simple reply (11): this kind of comments have only a very few words
and have not exact significant meaning: ``@** no.'', ``yes'' and ``done''.
\end{itemize}
On one hand, the number of these comments is relatively small and
on the other hand these comments do not provide too much contribution on the code review process.
Moreover, the process of taxonomy definition in Section~\ref{td}
has made our taxonomy relatively complete and robust for
its large-scale sampling, thorough discussion and multiple validation.
Therefore, we exclude them from our analysis and do not adjust our taxonomy.
\noindent
\textbf{In Summary.}
From the qualitative study, we identified a two-level taxonomy for
review comments which consists of 4 categories in Level 1
and 11 sub-categories in Level 2.
% \begin{framed}
% \noindent
% \textbf{RQ1:} {}
% \textit{From the qualitative study, we identified a two-level taxonomy for
% review comments which consists of 4 categories in Level 1
% and 11 sub-categories in Level 2.}
% \end{framed}
\subsection{RQ2: Is it possible to automatically classify
review comments according to the defined taxonomy}
\begin{table*}[ht]
\scriptsize
\centering
\caption{Classification performance on Level-2 subcategories}
\begin{tabular}{l |l@{}l@{}l|l@{}l@{}l|l@{}l@{}l|l@{}l@{}l|l@{}l@{}l|l@{}l@{}l}
\hline
\rowcolor[HTML]{000000}
&
\multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Rails}}} &
\multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Elasticsearch}}} &
\multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Angular.js}}}\\
% \multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Ave.}}}\\
\rowcolor[HTML]{000000}
&
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TBC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TSHC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TBC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TSHC}}}&
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TBC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TSHC}}}\\
\rowcolor[HTML]{000000}
\multirow{-3}*{{\color[HTML]{FFFFFF}\textbf{Cat.}}}&
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. }&
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. }&
{\color[HTML]{FFFFFF}F-M }\\
\hline
\textbf{L2-1} &
0.75 & 0.57 & 0.66 &
0.88 & 0.67 & 0.78 &
0.75 & 0.46 & 0.61 &
0.85 & 0.58 & 0.72 &
0.54 & 0.26 & 0.40 &
0.76 & 0.78 & 0.77 \\ \hline
\textbf{L2-2} &
0.63 & 0.84 & 0.74 &
0.71 & 0.86 & 0.79 &
0.67 & 0.92 & 0.80 &
0.77 & 0.82 & 0.80 &
0.65 & 0.71 & 0.68 &
0.69 & 0.71 & 0.70 \\ \hline
\textbf{L2-3} &
0.59 & 0.46 & 0.53 &
0.74 & 0.78 & 0.76 &
0.66 & 0.55 & 0.61 &
0.60 & 0.52 & 0.56 &
0.64 & 0.63 & 0.64 &
0.75 & 0.77 & 0.76 \\ \hline
\textbf{L2-4} &
0.56 & 0.35 & 0.46 &
0.73 & 0.58 & 0.66 &
0.92 & 0.84 & 0.88 &
0.91 & 0.88 & 0.90 &
0.83 & 0.72 & 0.78 &
0.84 & 0.79 & 0.82 \\ \hline
\textbf{L2-5} &
0.50 & 0.26 & 0.38 &
0.63 & 0.43 & 0.53 &
0.65 & 0.36 & 0.51 &
0.80 & 0.67 & 0.74 &
0.63 & 0.48 & 0.56 &
0.61 & 0.51 & 0.56 \\ \hline
\textbf{L2-6} &
0.47 & 0.21 & 0.34 &
0.60 & 0.36 & 0.48 &
0.33 & 0.11 & 0.22 &
0.38 & 0.26 & 0.32 &
0.45 & 0.51 & 0.48 &
0.47 & 0.49 & 0.48 \\ \hline
\textbf{L2-7} &
0.79 & 0.66 & 0.73 &
0.79 & 0.72 & 0.76 &
0.75 & 0.39 & 0.57 &
0.75 & 0.68 & 0.72 &
0.49 & 0.31 & 0.40 &
0.83 & 0.64 & 0.74 \\ \hline
\textbf{L2-8} &
0.83 & 0.77 & 0.80 &
0.98 & 0.78 & 0.88 &
0.57 & 0.35 & 0.46 &
0.88 & 0.90 & 0.89 &
0.35 & 0.16 & 0.26 &
0.96 & 0.67 & 0.82 \\ \hline
\textbf{L2-9} &
0.83 & 0.76 & 0.80 &
0.90 & 0.81 & 0.86 &
0.94 & 0.57 & 0.76 &
0.96 & 0.76 & 0.86 &
0.85 & 0.76 & 0.81 &
0.83 & 0.82 & 0.83 \\ \hline
\textbf{L2-10} &
0.98 & 0.93 & 0.96 &
0.99 & 0.98 & 0.99 &
0.91 & 0.84 & 0.88 &
0.93 & 0.95 & 0.94 &
0.90 & 0.80 & 0.85 &
0.94 & 0.93 & 0.94 \\ \hline
\textbf{L2-11} &
0.80 & 0.66 & 0.73 &
0.89 & 0.87 & 0.88 &
0.87 & 0.67 & 0.77 &
0.92 & 0.91 & 0.92 &
0.93 & 0.62 & 0.78 &
0.98 & 0.92 & 0.95 \\ \hline
\textbf{AVG} &
0.71 & 0.67 & \textbf{0.69} &
0.80 & 0.76 & \textbf{0.78} &
0.79 & 0.75 & \textbf{0.77} &
0.83 & 0.81 & \textbf{0.82} &
0.71 & 0.64 & \textbf{0.67} &
0.76 & 0.73 & \textbf{0.75} \\
\bottomrule
\end{tabular}
\label{tab:L2}
\end{table*}
In the evaluation, we design a text-based classifier (TBC) as a comparison baseline
which only uses the text content of review comments and
does not apply any inspection rule or composed feature.
TBC uses the same preprocessing techniques and SVM models as used in \TSHC.
Classification performance is evaluated through a 10-fold cross validation,
namely splitting review comments into 10 sets,
of which nine sets are used to train the classifiers
and the remaining set is for the performance test.
The process is repeated 10 times.
Moreover, the evaluation metrics used in the paper are precision, recall and
F-measure which are computed by the following formulas respectively.
\begin{equation*}
Prec(L2\textrm{-}i) = \frac{N_{CC}(L2\textrm{-}i)}{N_{TSHC}(L2\textrm{-}i)}
\label{equ:pre}
\end{equation*}
\begin{equation*}
Rec(L2\textrm{-}i) = \frac{N_{CC}(L2\textrm{-}i)}{N_{total}(L2\textrm{-}i)}
\label{equ:rec}
\end{equation*}
\begin{equation*}
F\textrm{-}M(L2\textrm{-}i) =
\frac{2*Prec(L2\textrm{-}i)*Rec(L2\textrm{-}i)}{Prec(L2\textrm{-}i)+Rec(L2\textrm{-}i)}
\label{equ:fm}
\end{equation*}
In the formulas,
$N_{total}(L2-i)$is the total number of comments of category \textit{L2-i} in the test dataset,
$N_{TSHC}(L2-i)$ is the number of comments that are classfied as \textit{L2-i} by \TSHC,
and $N_{CC}(L2-i)$is the number of comments that have been correctly classified as \textit{L2-i}.
Table~\ref{tab:L2} shows the precision, recall, and F-measure
provided by different approaches for Level-2 categories.
Our approach achieves the highest precision, recall, and F-measure scores
in all categories with only a few exceptions.
To provide an overall performance evaluation,
we use the weighted average value of F-measure~\cite{Zhou:2014} of all categories
by the proportions of instances in that category.
Equation~\ref{equ:wav} describes the formula to derive the average F-measure.
In the equation, the average F-measure is denoted as $F_{avg}$,
the F-measure of category \textit{L2-i} as $f_{i}$,
and the number of instances of category \textit{L2-i} as $n_{i}$.
\begin{equation*}
F_{avg} = \frac{\sum_{i=1}^{11}n_{i} * f_{i}}{\sum_{i=1}^{11}n_{i}}
\label{equ:wav}
\end{equation*}
\begin{table*}[ht]
\scriptsize
\centering
\caption{Classification performance of different methods}
\begin{tabular}{l |c c c| c c c| c c c}
\hline
\rowcolor[HTML]{000000}
&
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{Rails}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{Elasticsearch}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{Angular.js}}}\\
% \multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Ave.}}}\\
\rowcolor[HTML]{000000}
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Methods}}}&
{\color[HTML]{FFFFFF}TBC }&
{\color[HTML]{FFFFFF}TBC\_R } &
{\color[HTML]{FFFFFF}TBC\_CF } &
{\color[HTML]{FFFFFF}TBC }&
{\color[HTML]{FFFFFF}TBC\_R } &
{\color[HTML]{FFFFFF}TBC\_CF } &
{\color[HTML]{FFFFFF}TBC }&
{\color[HTML]{FFFFFF}TBC\_R } &
{\color[HTML]{FFFFFF}TBC\_CF } \\
\hline
$F_{avg}$ &
0.69 & 0.74 & 0.71 &
0.77 & 0.81 & 0.78 &
0.67 & 0.73 & 0.69 \\
Improvement (\%)&
- & 7.24 & 2.90 &
- & 5.19 & 1.30 &
- & 8.96 & 3.00 \\
\bottomrule
\end{tabular}
\label{tab:increase}
\end{table*}
The table indicates that our approach consistently outperforms
the baseline across the three projects.
Compared with that in the baseline,
the improvement in \TSHC running on each project
in terms of the weighted average F-measure is
9.2\% (from 0.688 to 0.780) in Rails, 5.3\% (from 0.767 to 0.820) in Elasticsearch,
and 7.2\% (from 0.675 to 0.747) in Angular.js.
% !!!!!!!!!!!!!!!!!!说个最高到达了多少
These results indicate that our approach is highly applicable in practice.
Furthermore, we would like to explore
whether the rule-based technique or composed features contribute more
to the increased performance of \TSHC compared with the baseline TBC.
Therefore, we design another two comparative settings.
\begin{itemize}
\item TBC\_R (TBC + rule): in this setting, we combine TBC with rule-based technique.
\item TBC\_CF (TBC + composed features): in this setting, we combine TBC with composed features in stage 2.
\end{itemize}
Table~\ref{tab:increase} presents the experiment result.
Overall speaking, TBC\_R is better than TBC\_CF
which means rule-based technique contributes more to the increased performance
than composed features in stage 2.
Compared with the TBC,
TBC\_R achieves an improvement of 7.24\%, 5.19\%, and 8.96\% of for Rails, ElasticSearch and Angular.JS respectively,
while TBC\_CF achieves 2.9\%, 1.30\% and 2.99\%.
We further study the review comments mis-categorized by \TSHC.
An example is that
\textit{``While karma does globally install with a bunch of plug-ins,
we do need the npm install because without that you don't get the karma-ng-scenario karma plug-in.''}.
\TSHC classifies it as belonging to Error Detecting (L2-2),
but it is actually a Solution Disagreeing (L2-5) comment.
The reason for this incorrect predication is twofold,
namely the lack of explicit discriminating terms
and the too specific expression for rejection.
Inspection rule of Solution Disagreeing (L2-5) is unable to matched
because of the lack of corresponding matching patterns.
ML classifiers tend to categorize it into Error Detecting (L2-2)
because the too specific expression of opinion makes it
more like a low-level comment about code correctness
instead of a high-level one about the pull-request decision.
We attempt to solve this problem by adding factors
(\eg Comment\_type and Code\_inclusion) in stage 2 of \TSHC,
which can help reveal whether a review comment is talking about
the pull-request as a whole or the solution detail.
Although the additional information improves the classification performance to some extent,
it is not sufficient to differentiate the two types of comments.
We plan to address the issue by extending the manually labeled dataset
and introducing a sentiment analysis.
\noindent
\textbf{In Summary.}
Compared with the baseline,
\TSHC achieves higher performance
in terms of the weighted average F-measure, namely
0.78 in Rails, 0.82 in Elasticsearch,
and 0.75 in Angular.js,
which indicates \TSHC is applicable
in practical automatic classification.
\subsection{RQ3: What are the typical review patterns among the reviewers' discussions?}
We now discuss some of our preliminary quantitative analysis.
We used \TSHC to automatically classify all the review comments,
based on which we explored the quantitative characteristics of review comments.
\subsubsection{Comments Distribution}
\begin{figure}[htbp]
\centering
\includegraphics[width=8cm]{resources/Fig-6.pdf}
\caption{The percentage of the Level-2 categories}
% \caption{The distribution of review comments on Level-2 categories}
\label{fig:stas_plot}
\end{figure}
Figure \ref{fig:stas_plot} shows the percentage of the Level-2 categories.
As a general view, the comments distribution over the three projects
represent similar patterns, in which most comments are about
Defect Detecting (L2-2),
Value Affirming(L2-4)
and Politely Responding (L2-10).
Significantly, Defect Detecting (L2-2) occupies the first place
(Rails: 42.5\%, EalsticSearch: 50.6\%, Angular.js: 33.1\%).
This is consistent with previous studies\cite{Gousios:2014b,Bacchelli:2013}
which stated that the main purpose of code review is to find defects.
Moreover, the development of open source projects rely on
the voluntary collaborative contribution~\cite{Shah:2006} of thousands of developers.
A harmonious and friendly community environment is beneficial to
promote contributors' enthusiasm
which is critical to the continuous evolution of a project.
As a result, a majority of reviewers never hesitate
to react positively to others' contribution(L2-10, L2-11)
and express their satisfaction with high-quality pull-requests(L2-4).
Actually, we also explored comments distribution on top of manually labeled dataset
and found it is similar to that on the automatically classified dataset.
The distribution difference between the two dataset is that
manually labeled dataset has more comments of L2-4 and less comments of L2-2
compared with the automatically labeled dataset.
But the difference is slight and does not affect the distribution outline.
As a result, we only reported our findings on the automatically classified dataset
which has the larger data size and therefore makes our findings more convincing.
\subsubsection{Cost-free comments are less frequent }
It is strange that although inspecting code style (L2-1)
and testing code (L2-3) cost reviewers very few effort,
which do not require too much understanding for code changes,
comments of these two categories are relatively less than others.
This is somewhat inconsistent with the study
conducted by Bacchelli \etal~\cite{Bacchelli:2013} on Microsoft teams,
in which they found code style (called code improvement in their study)
is the most frequent outcome of code reviews.
The reason for the inconsistence, in our opinion,
is the difference of development environment.
In GitHub, contributors are in a transparent environment
and their activities are visible to everyone\cite{Tsay:2014b,gousios2016work}.
Hence, in order to build and maintain good reputations,
contributors usually go through their code by themselves before submission
and prevent making apparent mistakes~\cite{gousios2016work}.
% \begin{table}[htbp]
% \setlength\extrarowheight{-1.5pt}
% \scriptsize
% \centering
% \caption{Examples of Error Detecting}
% \begin{tabular}{p{8cm}}
% %%%%%%%
% \bottomrule
% \rowcolor{gray}\textbf{Example 1} (inline comment)\\ \hline
% \rowcolor{lightgray}
% Changed codes: \\
% $-$ \color{red} \texttt{assert r.save, ``First save should be successful''}\\
% + \color[HTML]{008B00} \texttt{assert r.valid?, ``First validation should be}\\
% \quad \color[HTML]{008B00}successful''\\
% + \color[HTML]{008B00} \texttt{r.save} \\
% \rowcolor{lightgray}
% Review comments: \\
% \textit{Just this change is not necessary, it'll run validations twice,
% you canuse r.save in the assert call} \\
% %%%%%%%%
% %%%%%%%
% \bottomrule
% \rowcolor{gray}\textbf{Example 2} (inline comment) \\ \hline
% \rowcolor{lightgray}
% Changed codes: \\
% $-$ \color{red} \texttt{``\#{schema\_search\_path}-\#{sql}''}\\
% + \color[HTML]{008B00} \texttt{``\#{schema\_search\_path}-\#{sql}''.to\_sym}\\
% \rowcolor{lightgray}
% Review comments: \\
% \textit{This will introduce a memory leakage on system
% where theare a lot of possible SQLs.
% Remember symbols are not removed from memory
% from the garbage collector } \\
% %%%%%%%%
% %%%%%%%
% \bottomrule
% \rowcolor{gray}\textbf{Example 3} (general comment)\\ \hline
% \rowcolor{lightgray}
% Pull-request description: \\
% after\_remove callbacks only run if record was removed\\
% \rowcolor{lightgray}
% Review comments: \\
% \textit{This will change the existing behavior and
% people may be relying on it so I don't think it is safe
% to change it like this. If we want to change this
% we need a better upgrade path, like a global option
% to let people to switch the behavior} \\
% %%%%%%%%
% %%%%%%%
% \bottomrule
% \rowcolor{gray}\textbf{Example 4} (inline comment) \\ \hline
% \rowcolor{lightgray}
% Changed codes: \\
% + \color[HTML]{008B00} \texttt{class TableList}\\
% + \color[HTML]{008B00} \texttt{\quad delegate :klass, :to $\Rightarrow$ :@li, :allow\_nil $\Rightarrow$ true}\\
% + \color[HTML]{008B00} \texttt{\quad def initialize(li)}\\
% + \color[HTML]{008B00} \texttt{\quad \quad @li = li}\\
% + \color[HTML]{008B00} \texttt{\quad end}\\
% + \color[HTML]{008B00} \texttt{end}\\
% \rowcolor{lightgray}
% Review comments: \\
% \textit{This new class isn't needed -
% we can reuse an existing one} \\
% \bottomrule
% %%%%%%%%
% \end{tabular}
% \label{Example:ed}
% \end{table}
\subsubsection{Defects exist under green tests}
We were also curious about why a large percentage of comments were
focused on detecting defects (L2-2)
even though most of the pull-requests did not raise testing issues (L2-3).
After examining again those pull-requests
which received comments of L2-2 but did not received comments of L2-3,
we found two main factors resulting in this phenomenon.
One factor is contributors' unfamiliarity with the programming language and the APIs,
which is shown in the following example comments.
\textbf{Example\_3: }
\textit{
``Just this change is not necessary, it'll run validations twice,
you can use r.save in the assert call.''
}
\textbf{Example\_4: }
\textit{
``This will introduce a memory leakage on system where the are a lot of possible SQLs.
Remember symbols are not removed from memory from the garbage collector.''
}
The other one is the lack of development experience for some contributors,
which is shown in the following example comments.
\textbf{Example\_5: }
\textit{
``This will change the existing behavior and people may be relying on it
so I don't think it is safe to change it like this.
If we want to change this we need a better upgrade path,
like a global option to let people to switch the behavior.''
}
\textbf{Example\_6: }
\textit{
``This new class isn't needed - we can reuse an existing one.''
}
These factors tend to produce runnable but ``low-quality'' codes
which are of less elegance or
even break the compatibility and introduce potential risks.
This reality justifies the necessary of manual code review
even although an increasing number of automatic tools are coming into being.
\subsubsection{Convention are sometimes overlooked}
\begin{figure*}[htbp]
\centering
\includegraphics[width=\textwidth]{resources/Fig-7.pdf}
\caption{The cumulative distribution of pull-requests.
(a) Rails (b) Elasticsearch (c) Angular.js}
\label{fig:prd}
\end{figure*}
Also most projects set specific development conventions
which do not require too much effort to follow,
they are, somehow, overlooked by contributors which can be seen from the following examples.
\textbf{Example\_7: }
``\textit{
Thanks! Could you please add `[ci skip]' to your commit message
to avoid Travis to trigger a build ?
}''
\textbf{Example\_8: }
``\textit{
@xxx PR looks good to me.
Can you squash the two commits into one.
Fix and test should go there in a commit.
Also please add a change log. Thanks.
}''
To better understand this problem,
we did a statistical analysis to explore whether this kind of comments (L2-9)
distributed differently on the roles of contributors
(core members or external contributors).
We found that pull-requests submitted by external contributors
receive more comments of L2-9 than those submitted by core members,
which accounts for 83.9\%, 60.1\%, and 80.7\% of the total number in
Rails, Elasticsearch, and Angular.js respectively.
Furthermore, we explored how an external developer's contribution experience
affected the possibility of his pull-requests getting comments of L2-9.
Figure~\ref{fig:prd} shows how many percentage of the pull-requests
that receive comments of L2-9
are created and accumulated during developers' contribution history.
It is clear that a considerable proportion of such pull-requests
(68.4\% in Rails, 75.1\% in Elasticsearch, and 91.9\% in Angular.js)
have been generated in the first five submissions.
The above figures illustrate that
external contributors are more likely to submit pull-reqeusts
that break project conventions in their early contributions.
Therefore, it is necessary to make external contributors
gain a well understanding of project conventions in the very early stage.
In fact, most project management teams have provided contributors
with specific guidelines\footnote{
\eg \url{http://edgeguides.rubyonrails.org/contributing_to_ruby_on_rails.html
}, October 2017.}.
It seems, however, that not everyone is willing to
go through the tedious specification before contribution.
This may inspire GitHub to improve its collaborative mechanism
and offer more efficient development tools.
For example, it is better to briefly display
a clear list of development conventions
which are edited by the management team of a project
before a developer create a pull-request to this project.
Another alternative solution for GitHub is to provide automatic reviewing tools
that can be configured with predefined convention rules
and triggered as a new pull-request is submitted.
\noindent
\textbf{In Summary.}
Most comments are discussing about code correcting and social interactions.
External contributors are more likely to
break project conventions in their early contributions,
and their pull-requests might contain potential issues,
even though they have passed the tests.

56
PRCCE/6_threats.tex Executable file
View File

@ -0,0 +1,56 @@
\section{THREATS TO VALIDITY}
\label{sec:Threats}
In this section, we discuss threats to construct validity, internal validity
and external validity which may affect the results of our study.
\noindent\textbf{Construct validity:}
The first threat involves the taxonomy definition
since some of the categories could be overlapping or missing.
To alleviate this problem, we, together with other experienced developers,
randomly selected review comments and
classified them according to our taxonomy definition.
The verification result showed that our taxonomy is complete
and did not miss any significant categories.
Secondly, manually labeling thousands of reviews comments is a
repetitive, time-consuming and boring task.
To get reliable labeled set, the first author of the paper was assigned
to do this job and we tried our best to provide him a pleasant work environment.
Moreover, if it has been a long time since last round of labeling work,
the first author would revisit the taxonomy and go through some labeled comments
to ensure his accurate and consistent understanding of his job
before the next round of labeling work proceeds.
In addition, it is possible that incorrect classification of \TSHC may affect our findings
even our model achieved high precision of 76\%-83\%.
However, our preliminary quantitative study is mainly
based on the comments distribution on each category,
and wrong classification is not likely to alter the distribution significantly,
therefore our findings are not easily affected by some extent of incorrect classification.
\noindent\textbf{Internal validity:}
There are many machine learning methods
that can be used to solve the classification problem
and the choice of ML algorithm has a direct impact on the classification performance.
We compared several ML algorithms including linear regression classifier,
adaBoost classifier, random forest classifier, and SVM and
found that SVM performed better than others on classifying review comments.
Furthermore, we also did some necessary parameter optimization.
\noindent\textbf{External validity:}
Our findings are based on the dataset of three open source projects hosted on GitHub.
To increase the generalizability of our research,
we have selected projects with different programming language,
and different application areas.
Nevertheless our dataset is a small sample compared to
the total number of projects in GitHub.
Hence, it is not very sure whether the results can be generalized to
all the other projects hosted on GitHub or
those which are hosted on other platforms.

40
PRCCE/8_conlusion.tex Executable file
View File

@ -0,0 +1,40 @@
\section{CONCLUSION\&FUTURE WORK}
\label{sec:Concl}
Code review is one of the most significant stages in pull-based development.
It ensures that only high-quality pull-requests are accepted,
based on the in-depth discussions among reviewers.
To comprehensively understand the review topics in pull-based model,
we first conducted a qualitative study on
three popular open-source software projects hosted on GitHub
and constructed a fine-grained two-level taxonomy covering
4 Level-1 categories (Code Correctness, PR Decision-making, Project Management , and Social Interaction)
and 11 Level-2 sub-categories (\eg defect detecting, reviewer assigning, contribution encouraging).
Second, we did preliminary quantitative analysis on a large set of review comments
that are labeled by a two-stage hybrid classification algorithm
which is able to automatically classify review comments by combining rule-based and machine-learning techniques.
Through the quantitative study, we explored the typical review patterns.
We found that the three projects represent similar comments distribution on each category.
Pull-requests submitted by inexperienced contributors
tend to contain potential issues even though they have passed the tests.
Furthermore, external contributors are more likely to break project conventions
in their early contributions.
%further work
Nevertheless, \TSHC performs poorly on a few Level-2 subcategories.
More work could be done in the future to improve it.
We plan to address the shortcomings of our approach
by extending the manually labeled dataset
and introducing the sentiment analysis.
Moreover, we will try to dig more valuable information
(\eg comment co-occurrence, emotion shift,\etc)
from the experiment result in the paper
and assist core members
to better organize the code review process,
such as improving reviewer recommendation,
contributor assessment,
and pull-request prioritization.
% \section*{Acknowledgment}
% The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61303064, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
% The authors would like to thank... more thanks here

462
PRCCE/9_ref.bib Executable file
View File

@ -0,0 +1,462 @@
@article{Yu:2015,
title={Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment?},
author={Yu, Yue and Wang, Huaimin and Yin, Gang and Wang, Tao},
journal={Information and Software Technology},
volume={74},
pages={204--218},
year={2016},
publisher={Elsevier}
}
@inproceedings{yu2014reviewer,
title={Reviewer recommender of pull-requests in GitHub},
author={Yu, Yue and Wang, Huaimin and Yin, Gang and Ling, Charles X},
booktitle={2014 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
pages={609--612},
year={2014},
organization={IEEE}
}
@inproceedings{yu2015wait,
title={Wait For It: Determinants of Pull Request Evaluation Latency on GitHub},
author={Yu, Yue and Wang, Huaimin and Filkov, Vladimir and Devanbu, Premkumar and Vasilescu, Bogdan},
booktitle={MSR},
year={2015}
}
@inproceedings{Barr:2012,
title={Cohesive and isolated development with branches},
author={Barr, Earl T and Bird, Christian and Rigby, Peter C and Hindle, Abram and German, Daniel M and Devanbu, Premkumar},
booktitle={International Conference on Fundamental Approaches To Software Engineering},
pages={316-331},
year={2012},
}
@inproceedings{Gousios:2014,
title={An exploratory study of the pull-based software development model},
author={Gousios, Georgios and Pinzger, Martin and Deursen, Arie Van},
booktitle={International Conference Software Engineering},
pages={345-355},
year={2014},
}
@inproceedings{Vasilescu:B,
title={Quality and productivity outcomes relating to continuous integration in GitHub},
author={Vasilescu, Bogdan and Yu, Yue and Wang, Huaimin and Devanbu, Premkumar and Filkov, Vladimir},
booktitle={Joint Meeting},
pages={805-816},
year={2015},
}
@inproceedings{Tsay:2014a,
title={Let's talk about it: evaluating contributions through discussion in GitHub},
author={Tsay, Jason and Dabbish, Laura and Herbsleb, James},
booktitle={The ACM Sigsoft International Symposium},
pages={144-154},
year={2014},
}
@inproceedings{Gousios:2014b,
title={Work practices and challenges in pull-based development: the integrator's perspective},
author={Gousios, Georgios and Zaidman, Andy and Storey, Margaret-Anne and Van Deursen, Arie},
booktitle={Proceedings of the 37th International Conference on Software Engineering},
pages={358--368},
year={2015},
organization={IEEE}
}
@inproceedings{gousios2016work,
title={Work practices and challenges in pull-based development: The contributor's perspective},
author={Gousios, Georgios and Storey, Margaret-Anne and Bacchelli, Alberto},
booktitle={Proceedings of the 38th International Conference on Software Engineering},
pages={285--296},
year={2016},
organization={ACM}
}
@inproceedings{Marlow:2013,
title={Impression formation in online peer production: activity traces and personal profiles in github},
author={Marlow, Jennifer and Dabbish, Laura and Herbsleb, Jim},
booktitle={Proceedings of the 2013 conference on Computer supported cooperative work},
pages={117--128},
year={2013},
organization={ACM}
}
@inproceedings{Veen:2015,
title={Automatically prioritizing pull requests},
author={Van Der Veen, Erik and Gousios, Georgios and Zaidman, Andy},
booktitle={Proceedings of the 12th Working Conference on Mining Software Repositories},
pages={357--361},
year={2015},
organization={IEEE Press}
}
@inproceedings{Thongtanunam:2015,
title={Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review},
author={Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit and Kula, Raula Gaikovina and Yoshida, Norihiro and Iida, Hajimu and Matsumoto, Kenichi},
booktitle={IEEE International Conference on Software Analysis, Evolution, and Reengineering},
pages={141-150},
year={2015},
}
@inproceedings{t2015cq,
title={Investigating code review practices in defective files: an empirical study of the Qt system},
author={Thongtanunam, Patanamon and McIntosh, Shane and Hassan, Ahmed E and Iida, Hajimu},
booktitle={Proceedings of the 12th Working Conference on Mining Software Repositories},
pages={168--179},
year={2015},
organization={IEEE Press}
}
@inproceedings{rahman2016correct,
title={CoRReCT: code reviewer recommendation in GitHub based on cross-project and technology experience},
author={Rahman, Mohammad Masudur and Roy, Chanchal K and Collins, Jason A},
booktitle={Proceedings of the 38th International Conference on Software Engineering Companion},
pages={222--231},
year={2016},
organization={ACM}
}
@article{jiang:2015,
title={CoreDevRec: Automatic Core Member Recommendation for Contribution Evaluation},
author={Jiang, Jing and He, Jia Huan and Chen, Xue Yuan},
journal={Journal of Computer Science and Technology},
volume={30},
number={5},
pages={998-1016},
year={2015},
}
@inproceedings{Tsay:2014b,
title={Influence of social and technical factors for evaluating contribution in GitHub},
author={Tsay, Jason and Dabbish, Laura and Herbsleb, James},
booktitle={ICSE},
pages={356-366},
year={2014},
}
@inproceedings{Bacchelli:2013,
title={Expectations, outcomes, and challenges of modern code review},
author={Bacchelli, A and Bird, C},
booktitle={International Conference on Software Engineering},
pages={712-721},
year={2013},
}
@article{Shah:2006,
title={Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development.},
author={Shah, Sonali K.},
journal={Management Science},
volume={52},
number={7},
pages={1000-1014},
year={2006},
}
@book{Porter:1997,
title={An algorithm for suffix stripping},
author={Porter, M. F},
publisher={Morgan Kaufmann Publishers Inc.},
year={1997},
}
@inproceedings{Yu:2014,
title={Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration},
author={Yu, Yue and Wang, Huaimin and Yin, Gang and Ling, Charles X},
booktitle={Asia-Pacific Software Engineering Conference},
pages={335-342},
year={2014},
}
@article{Thongtanunam:2016,
title={Review participation in modern code review},
author={Thongtanunam, Patanamon and Mcintosh, Shane and Hassan, Ahmed E. and Iida, Hajimu},
journal={Empirical Software Engineering},
pages={1-50},
year={2016},
}
@inproceedings{Beller:2014,
title={Modern code reviews in open-source projects: which problems do they fix?},
author={Beller, Moritz and Bacchelli, Alberto and Zaidman, Andy and Juergens, Elmar},
booktitle={Working Conference on Mining Software Repositories},
pages={202-211},
year={2014},
}
@article{Kollanus:2009,
title={Survey of Software Inspection Research},
author={Kollanus, Sami and Koskinen, Jussi},
journal={Open Software Engineering Journal},
volume={3},
number={1},
year={2009},
}
@inproceedings{Baysal:2013,
title={The influence of non-technical factors on code review},
author={Baysal, O. and Kononenko, O. and Holmes, R. and Godfrey, M. W.},
booktitle={Reverse Engineering},
pages={122-131},
year={2013},
}
@article{Baysal:2015,
title={Investigating technical and non-technical factors influencing modern code review},
author={Baysal, Olga and Kononenko, Oleksii and Holmes, Reid},
journal={Empirical Software Engineering},
volume={21},
number={3},
pages={1-28},
year={2016},
}
%Mcintosh2016An
@article{Mcintosh:***,
title={An empirical study of the impact of modern code review practices on software quality},
author={Mcintosh, Shane and Kamei, Yasutaka and Adams, Bram},
journal={Empirical Software Engineering},
volume={21},
number={5},
pages={1-44},
year={2016},
}
@inproceedings{Mcintosh:2014,
title={The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects},
author={Mcintosh, Shane and Kamei, Yasutaka and Adams, Bram and Hassan, Ahmed E},
booktitle={Working Conference on Mining Software Repositories},
pages={192-201},
year={2014},
}
%%%%和Fagan:1999 一样
@incollection{Fagan:1976,
title={Design and code inspections to reduce errors in program development},
author={Fagan, Michael E},
booktitle={Pioneers and Their Contributions to Software Engineering},
pages={301--334},
year={2001},
publisher={Springer}
}
@book{Frank:1984,
title={Software inspections and the industrial production of software},
author={Ackerman, A. Frank and Fowler, Priscilla J and Ebenau, Robert G},
publisher={Elsevier North-Holland, Inc.},
pages={13-40},
year={1984},
}
@book{Baeza:2004,
title={Modern Information Retrieval},
author={Baeza-Yates, Ricardo A and Ribeiro-Neto, Berthier},
publisher={China Machine Press},
pages={2628},
year={2004},
}
@article{Ackerman:1989,
title={Software inspections: an effective verification process},
author={Ackerman, A. F and Buchwald, L. S and Lewski, F. H},
journal={IEEE Software},
volume={6},
number={3},
pages={31-36},
year={1989},
}
@article{Russell:1991,
title={Experience With Inspection in Ultralarge-Scale Development},
author={Russell, Glen W},
journal={IEEE Software},
volume={8},
number={1},
pages={25-31},
year={1991},
}
@article{Aurum:2002,
title={State-of-the-art: software inspections after 25 years},
author={Aurum, Aybuke and Petersson, Hakan and Wohlin, Claes},
journal={Sofware Testing Verification \& Reliability},
volume={12},
number={3},
pages={133-154},
year={2002},
}
%Rigby2012Contemporary
@article{Rigby:**,
title={Contemporary Peer Review in Action: Lessons from Open Source Development},
author={Rigby, P. C and Cleary, B and Painchaud, F and Storey, M},
journal={IEEE Software},
volume={29},
number={6},
pages={56-61},
year={2012},
}
@article{Rigby:2006,
title={A preliminary examination of code review processes in open source projects},
author={Rigby, Peter C. and German, Daniel M.},
journal={University of Victoria},
}
@inproceedings{Rigby:2008,
title={Open source software peer review practices: a case study of the apache server},
author={Rigby, Peter C and German, Daniel M and Storey, Margaret Anne},
booktitle={International Conference on Software Engineering},
pages={541-550},
year={2008},
}
@inproceedings{Rigby:2011,
title={Understanding broadcast based peer review on open source software projects},
author={Rigby, Peter C. and Storey, Margaret Anne},
booktitle={International Conference on Software Engineering},
pages={541-550},
year={2011},
}
@inproceedings{Riby:2013,
title={Convergent contemporary software peer review practices},
author={Rigby, Peter C and Bird, Christian},
booktitle={Joint Meeting on Foundations of Software Engineering},
pages={202-212},
year={2013},
}
@book{Rigby:2014,
title={A Mixed Methods Approach to Mining Code Review Data: Examples and a study of multi-commit reviews and pull requests},
author={Rigby, Peter C and Bacchelli, Alberto and Gousios, Georgios and Mukadam, Murtuza},
pages={231-255},
year={2014},
}
@inproceedings{Gerrit,
title={Gerrit. https://code.google.com/p/gerrit/. \hl{Accessed 2014/01/20}}
}
@inproceedings{GitHub,
title={GitHub. https://github.com/. Accessed \hl{2014/01/14}}
}
@inproceedings{Trustie,
title={Trustie. https://www.trustie.net/ \hl{Accessed 2014/01/14}}
}
@inproceedings{Baum:2016,
title={Factors influencing code review processes in industry},
author={Baum, Tobias and Liskin, Olga and Niklas, Kai and Schneider, Kurt},
booktitle={ACM Sigsoft International Symposium},
pages={85-96},
year={2016},
}
@article{Ciolkowski:2003,
title={Software Reviews: The State of the Practice},
author={Ciolkowski, Marcus and Laitenberger, Oliver and Biffl, Stefan},
journal={Software IEEE},
volume={20},
number={20},
pages={46-51},
year={2003},
}
@article{Yu:2016,
title={Determinants of pull-based development in the context of continuous integration},
author={Yu, Yue and Yin, Gang and Wang, Tao and Yang, Cheng and Wang, Huaimin},
journal={Science China Information Sciences},
volume={59},
number={8},
pages={1-14},
year={2016},
}
@inproceedings{Lima,
title={Developers assignment for analyzing pull requests},
author={De, Lima J and Nior, Manoel Limeira and Soares, Daric and Moreira, Lio and Plastino, Alexandre and Murta, Leonardo},
booktitle={The ACM Symposium},
pages={1567-1572},
year={2015},
}
@inproceedings{Antoniol:2008,
title={Is it a bug or an enhancement?: a text-based approach to classify change requests},
author={Antoniol, Giuliano and Ayari, Kamel and Penta, Massimiliano Di and Khomh, Foutse and Neuc, Yann Ga},
booktitle={Conference of the Centre for Advanced Studies on Collaborative Research, October 27-30, 2008, Richmond Hill, Ontario, Canada},
pages={23},
year={2008},
}
@article{Zhou:2014,
title={Combining text mining and data mining for bug report classification},
author={Zhou, Yu and Tong, Yanxiang and Gu, Ruihang and Gall, Harald},
journal={Journal of Software: Evolution and Process},
year={2016},
publisher={Wiley Online Library}
}
@inproceedings{Pingclasai:2013,
title={Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling},
author={Pingclasai, N. and Hata, H. and Matsumoto, K. I.},
booktitle={Asia-Pacific Software Engineering Conference},
pages={13 - 18},
year={2013},
}
@book{Herzig:2013,
title={It's not a Bug, it's a Feature: How Misclassification Impacts Bug Prediction},
author={Herzig, Kim and Just, Sascha and Zeller, Andreas},
pages={392-401},
year={2013},
}
@inproceedings{Ciurumelea:2017,
title={Analyzing Reviews and Code of Mobile Apps for Better Release Planning},
author={Ciurumelea, Adelina and Schaufelbuhl, Andreas and Panichella, Sebastiano and Gall, Harald},
booktitle={IEEE International Conference on Software Engineering},
year={2016},
}
@inproceedings{Yu:2015b,
title={Quality and productivity outcomes relating to continuous integration in GitHub},
author={Vasilescu, Bogdan and Yu, Yue and Wang, Huaimin and Devanbu, Premkumar and Filkov, Vladimir},
booktitle={Joint Meeting},
pages={805-816},
year={2015},
}
@article{zhang2017social,
title={Social media in GitHub, the role of@-mention in assisting software development},
author={Zhang, Yang and Wang, Huaimin and Yin, Gang and Wang, Tao and Yu, Yue},
journal={Information Sciences},
volume={60},
number={032102},
pages={032102:1-032102:18},
year={2017}
}
@inproceedings{Storey2014The,
title={The (R) Evolution of social media in software engineering},
author={Storey, Margaret Anne and Singer, Leif and Cleary, Brendan and Filho, Fernando Figueira and Zagalsky, Alexey},
booktitle={on Future of Software Engineering},
pages={100-116},
year={2014},
}
@inproceedings{Zhu2016Effectiveness,
title={Effectiveness of code contribution: from patch-based to pull-request-based tools},
author={Zhu, Jiaxin and Zhou, Minghui and Mockus, Audris},
booktitle={ACM Sigsoft International Symposium on Foundations of Software Engineering},
pages={871-882},
year={2016},
}

141
PRCCE/JCST.STY Executable file
View File

@ -0,0 +1,141 @@
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{Jcst}
[2009/10/08 for Journal of Computer Science & Technology]
%\DeclareOption*{\PassOptionsToClass{\CurrentOption}{article}}
\RequirePackage{multicol}
\RequirePackage{graphicx}
\RequirePackage{CJK}
\RequirePackage{float}
% Force 'figure' and 'table' to use [H] placement
\let\figure@bak\figure
\renewcommand\figure[1][]{\figure@bak[H]}
\let\table@bak\table
\renewcommand\table[1][]{\table@bak[H]}
\let\balance\relax
\long\def\singlecolumn#1{\end{multicols}#1\begin{multicols}{2}}
%---------------------
\setlength{\headsep}{5truemm}
\setlength{\headheight}{4truemm}
\setlength{\textheight}{231truemm}
\setlength{\textwidth}{175truemm}
\setlength{\topmargin}{0pt}
\setlength{\oddsidemargin}{-0.5cm}
\setlength{\evensidemargin}{-0.5cm}
\setlength{\footskip}{8truemm}
\setlength{\columnsep}{7truemm}
\setlength{\columnseprule}{0pt}
\setlength{\parindent}{2em}
\renewcommand{\baselinestretch}{1.2}
\renewcommand{\arraystretch}{1.5}
\abovedisplayskip=10pt plus 2pt minus 2pt
\belowdisplayskip=10pt plus 2pt minus 2pt
\renewcommand{\footnoterule}{\kern 1mm \hrule width 10cm \kern 2mm}
%\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\renewcommand{\thefootnote}{}
%-------------------- Set Page Head -----------------------
%#1 for first page author's name and title
%#2 for volume
%#3 for issue
%#4 for end page
%#5 for month
%#6 for year
%#7 for right page author's name and title
\newcommand{\setpageinformation}[7]{%
\thispagestyle{empty}
\pagestyle{myheadings}\markboth
{\hss{\small\sl J. Comput. Sci. \& Technol., #5. #6, #2, #3}}
{{\small\sl {#7}}\rm\hss}
\vspace*{-13truemm}
{\par\noindent\parbox[t]{17.5cm}{\small {#1}.
\rm JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
#2 #3: \thepage--{#4} \ #5 #6}}
\def\@evenfoot{}
\def\@oddfoot{}
}%
%--------------- Title, keywords, Section, and key words etc., ------------
\renewcommand{\title}[1]{\par\noindent\medskip\unskip\\\Large\textbf{#1}\par\medskip}%
\newcommand{\keywords}[1]{\small\par\noindent\textbf{Keywords}\quad #1\par\bigskip}
\newcommand\email[1]{\par\medskip\noindent\normalsize\textrm{E-mail: ~ #1}}
\newcommand\received[1]{\par\medskip\noindent\normalsize\textrm{Received #1}\par\medskip}
\newcommand\revised[1]{\par\medskip\noindent\normalsize\textrm{Revised #1}\par\medskip}
\renewcommand\author[1]{\par\medskip\noindent\normalsize\unskip\\\textrm{#1}}
\newcommand\address[1]{\par\medskip\noindent\small\unskip\\\textit{#1}}
\renewenvironment{abstract}{\par\medskip\small
\noindent{\bfseries \abstractname\quad}}{\par\medskip}
%----------------------------------------------------------------------
\renewcommand\section{\@startsection{section}{1}{\z@}%
{3.5ex \@plus 1ex \@minus .2ex}%
{2.3ex \@plus.2ex}%
{\normalfont\normalsize\bfseries}}
\renewcommand\subsection{\@startsection{subsection}{2}{\z@}%
{3.25ex\@plus 1ex \@minus .2ex}%
{1.5ex \@plus .2ex}%
{\normalfont\normalsize\bfseries}}
\renewcommand\subsubsection{\@startsection{subsubsection}{2}{\z@}%
{3.25ex\@plus 1ex \@minus .2ex}%
{1.5ex \@plus .2ex}%
{\normalfont\normalsize\itshape}}
%\def\@seccntformat#1{\csname the#1\endcsname. }
%%
% Insert \small and remove colon after table/figure number
\long\def\@makecaption#1#2{%
\vskip\abovecaptionskip
\small%
\sbox\@tempboxa{#1 #2}%
\ifdim \wd\@tempboxa >\hsize
#1. #2\par
\else
\global \@minipagefalse
\hb@xt@\hsize{\hfil\box\@tempboxa\hfil}%
\fi
\vskip\belowcaptionskip}
%----------------------------------------------
\renewcommand\theequation{\arabic{equation}}
\renewcommand\thetable{\arabic{table}}
\renewcommand\thefigure{\arabic{figure}}
\renewcommand\refname{References}
\renewcommand\tablename{\small \textbf{Table}}
\renewcommand\figurename{\small \rm Fig.}
%\AtBeginDocument{\label{firstpage}}
%\AtEndDocument{\label{lastpage}}
\renewcommand \thefigure {\@arabic\c@figure}
\long\def\@makecaption#1#2{%
\vskip\abovecaptionskip
\sbox\@tempboxa{\textbf{#1.} #2}%
\ifdim \wd\@tempboxa >\hsize
#1. #2\par
\else
\global \@minipagefalse
\hb@xt@\hsize{\hfil\box\@tempboxa\hfil}%
\fi
\vskip\belowcaptionskip}
%---------------- Theorem, Lemma, etc., and proof ----------
\def\@begintheorem#1#2{\par{\bf #1~#2.~}\it\ignorespaces}
\def\@opargbegintheorem#1#2#3{\par{\bf #1~#2~(#3).~}\it\ignorespaces}
\newenvironment{proof}{\par\textit{Proof.~}\ignorespaces}
{\hfill~\fbox{\vbox to0.5mm{\vss\hbox to0.5mm{\hss}\vss}}\par}
\newtheorem{theorem}{\indent Theorem}
\newtheorem{lemma}{\indent Lemma}
\newtheorem{remark}{\indent Remark}
\newtheorem{definition}{\indent Definition}
%-----------------------------------------------
\newcommand{\biography}[1]{\par\noindent
\vbox to 36truemm{\vss
\hbox to 32truemm{\hss\includegraphics[height=32truemm,width=24truemm]
{#1}\hss}}\par\vspace*{-34truemm}\par
\hangindent=32truemm\hangafter=-10\noindent\indent}
%-----------------------------------------------
\endinput
%%
%% End of file `jcst.sty'.

1002
PRCCE/jcst.bst Executable file

File diff suppressed because it is too large Load Diff

112
PRCCE/main.tex Executable file
View File

@ -0,0 +1,112 @@
\documentclass[12pt,twoside]{article}
\usepackage{Jcst}
%%% from old %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{framed}
\usepackage{amsmath}
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
\usepackage{graphicx}
\usepackage{subfig}
\usepackage{setspace}
\usepackage{color,soul}
\usepackage{colortbl}
% \usepackage{enumitem}
\usepackage{enumerate}
\usepackage{threeparttable}
\usepackage{multirow,booktabs}
% \usepackage{xcolor}
\usepackage[table,xcdraw]{xcolor}
\usepackage{url}
\usepackage{array}
\usepackage{booktabs}
% \usepackage{subfigure}
\usepackage{caption}
\usepackage{dcolumn}
\usepackage{xspace}
\usepackage{balance}
\usepackage{bm}
\usepackage{cite}
\usepackage{amsmath}
\newcommand{\ie}{{\emph{i.e.}},\xspace}
\newcommand{\viz}{{\emph{viz.}},\xspace}
\newcommand{\eg}{{\emph{e.g.}},\xspace}
\newcommand{\etc}{\emph{etc}}
\newcommand{\etal}{{\emph{et al.}}}
\newcommand{\GH}{{\sc GitHub}\xspace}
\newcommand{\BB}{{\sc BitBucket}\xspace}
\newcommand{\TSHC}{TSHC\xspace}
\makeatletter
\g@addto@macro{\UrlBreaks}{\UrlOrds}
\makeatother
\makeatletter
\def\url@leostyle{%
\@ifundefined{selectfont}{\def\UrlFont{\same}}{\def\UrlFont{\scriptsize\bf\ttfamily}}}
\makeatother
\urlstyle{leo}
%%% from old %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\setcounter{page}{1}
\setpageinformation
%{Head of the first page} {Running head of odd pages}
{Li ZX, Yu Y, Yin G \textit{et al.} Analyzing Code Reviews in Pull-based Model}
{}{}{}{Mon.}{Year}
{Zhi-Xing Li {\it et al}.:Analyzing Code Reviews in Pull-based Model}
\begin{CJK}{GBK}{song}
\title{What are they talking about?\\
Analyzing Code Reviews in Pull-based Development Model}
\footnotetext{Special Section on Software Systems}
\footnotetext{This work was supported by the National Natural Science Foundation of China
under Grant Nos. 61432020, 61303064, 61472430 and 61502512
and National Grand R\&D Plan under Grant No. 2016YFB1000805.}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\author{Zhi-Xing Li,
Yue Yu~\footnote{Corresponding Author}, {\it Member, CCF, ACM},
Gang Yin, {\it Member, CCF, ACM},
Tao Wang, {\it Member, CCF, ACM},
and Huai-Min Wang, {\it Member, CCF, ACM}}
\address{College of Computer, National University of Defense Technology, Changsha 410073, China
}
\email{\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn}
\received{April 21, 2017}
\revised{June 30, 2017}%Revised date
\input{0_abstract}
\keywords{Pull-request; code review; review comments}
\begin{multicols}{2}
\normalsize
\renewcommand{\thefootnote}{\arabic{footnote}}
\setcounter{footnote}{0}
\input{1_introduction}
\input{2_background}
\input{3_approach}
\input{4_result}
\input{6_threats}
\input{8_conlusion}
\bibliographystyle{jcst}
\bibliography{9_ref}
\input{10_authors}
\end{multicols}
\end{CJK}
\end{document}

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.7 KiB

BIN
Response.docx Normal file

Binary file not shown.

View File

@ -0,0 +1,31 @@
\begin{abstract}
Code reviews in pull-based model are open to community users on GitHub.
Various participants are taking part in the review discussions and
what they are talking about are not only limited to improvement of code contributions
but also project evolution and social interaction.
A comprehensive understanding of the topics of code reviews in pull-based model
would be useful to better organize the code review process and optimize review tasks
such as reviewer recommendation and pull-request prioritization.
In this paper, we first conducted a qualitative study on
three popular open-source software projects hosted on GitHub
and constructed a fine-grained two-level taxonomy covering
4 Level-1 categories (Code Correctness, PR Decision-making, Project Management, and Social Interaction)
and 11 Level-2 sub-categories (\eg defect detecting, reviewer assigning, contribution encouraging).
Second, we did preliminary quantitative analysis on a large set of review comments
that were labeled by TSHC, a two-stage hybrid classification algorithm
which is able to automatically classify review comments by combining rule-based and machine-learning techniques.
Through the quantitative study, we explored the typical review patterns.
We found that the three projects represent similar comments distribution on each category.
Pull-requests submitted by inexperienced contributors
tend to contain potential issues even though they have passed the tests.
Furthermore, external contributors are more likely to break project conventions
in their early contributions.
\end{abstract}

View File

@ -0,0 +1,36 @@
\biography{resources/authors/1-ZhixingLi}%No empty line here
{\bf Zhixing Li}
is a Master student in the College of Computer at National University of Defense Technology.
His work interests include open source software engineering, data mining, and knowledge discovering in open source software.
\biography{resources/authors/2-YueYu}%No empty line here
{\bf Yue Yu}
is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2016.
He has visited UC Davis supported by CSC scholarship.
His research findings have been published on MSR, FSE, IST, and ICSME.
His current research interests include software engineering, spanning from mining software repositories
and analyzing social coding networks.
\biography{resources/authors/3-GangYin}%No empty line here
{\bf Gang Yin}
is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2006.
He has worked in several grand research projects including National 973, 863 projects.
He has published more than 60 research papers in international conferences and journals.
His current research interests include distributed computing, information security, software engineering, and machine learning.
\biography{resources/authors/4-TaoWang}%No empty line here
{\bf Tao Wang}
is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2015.
His work interests include open source software engineering, machinelearning, datamining, and knowledge discovering in open source software.
\biography{resources/authors/5-HuaiminWang}%No empty line here
{\bf Huaimin Wang}
is a professor in the College of Computer at National University of Defense Technology.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 1992.
He has been awarded the ``Chang Jiang Scholars Program'' professor and the ``Distinct Young Scholar'', etc.
He has published more than 100 research papers in peer-reviewed international conferences and journals.
His current research interests include middleware, software agent, and trustworthy computing.

View File

@ -0,0 +1,129 @@
\section{Introduction}
%Pull request
The pull-based development model is becoming increasingly popular in distributed collaboration
for open-source software (OSS) development~\cite{Barr:2012,Gousios:2014,Gousios:2014b,gousios2016work}.
On GitHub~\footnote{\url{https://github.com/}} alone,
the largest social-coding community, nearly half of collaborative projects
(already over 1 million~\cite{gousios2016work} in January 2016) have adopted this model.
In pull-based development,
any contributor can freely \emph{fork} (\ie clone) an interesting public project
and modify the forked repository locally (\eg fixing bugs and adding new features)
without asking for the access to the central repository.
When the changes are ready to merge back to the master branch,
the contributors submit a \emph{pull-request},
and then a rigorous code review process is performed
before the pull-request get accepted.
%Review
Code review is a communication channel where integrators,
who are core members of a project, can express their concern
for the contribution~\cite{Tsay:2014a,Gousios:2014b,Marlow:2013,yu2015wait}.
If they doubt the quality of a submitted pull-request,
integrators make comments that
ask the contributors to improve the implementation.
With pull-requests becoming increasingly popular,
most large OSS projects allow for crowd sourcing of pull-request reviews
to a large number of external developers~\cite{Yu:2015,yu2014reviewer}
to reduce the workload of integrators.
These external developers are interested in these projects and
are concerned about the development of them.
After receiving the review comments, the contributor usually responds positively and
updates the pull-request for another round of review.
Thereafter, the responsible integrator makes a decision to accept the pull-request
or reject it by taking all judgments and changes into consideration.
Previous studies~\cite{Tsay:2014b,Yu:2016,yu2015wait,t2015cq} have shown that
code review, as a well-established software quality practice,
is one of the most significant stages in software development.
It ensures that only high-quality code changes are accepted,
based on the in-depth discussion among reviewers.
With the evolution of collaboration tools and environment~\cite{Storey2014The,Zhu2016Effectiveness},
development activities in the communities like GitHub are more transparent and social.
{\color{red}Contribution evaluation in the social coding communities is more complex
than simple acceptance or rejection~\cite{Tsay:2014a}. }
Various participants are taking part in the discussion and
what they are talking about are not only limited to improvement of code contributions
but also project evolution and social interaction.
A comprehensive understanding of the topics of code review would be useful to
better organize the code review process and optimize review tasks
such as reviewer recommendation~\cite{Yu:2015,Thongtanunam:2015,jiang:2015,rahman2016correct}
and pull-request prioritization~\cite{Veen:2015}.
{\color{red}
Several similar studies
have been conducted on analyzing reviewers' motivation and issues raised in code reviews.
Bacchelli \etal~\cite{Bacchelli:2013} explored the motivation, outcome and challenge in code reviews
based on an investigation of developers and projects at Microsoft.
They found that although defect finding is still the most motivation in code reviews,
defect related comments comprise a small proportion.
Their study focused on commercial projects and
the reviewers of these projects are mainly the employees at Microsoft.
However, projects using pull-based model in GitHub are developed in a more transparent and open environment
and code reviews are executed by the community users.
Community reviewers may serve for different organizations, use the project for various purposes,
and hold specific consideration in code review processes.
Therefore different review topics may exist in pull-based development model
compared to those in commercial development model.
Tsay \etal~\cite{Tsay:2014a} explored issues raised around code contributions in GitHub.
They reported that reviewers discuss on both
the appropriateness of the contribution proposal and the correctness of the implemented solution.
They only analyzed comments of highly discussed pull-requests which have extended discussions.
According to the statistics in their dataset, however,
each pull-request gets 2.6 comments in average and only about 16\% of the pull-request have extended discussions.
The small proportion of explored pull-requests may result in bias to the experiment results
and constrain the generalization of the their findings.
Therefore, we tried to revisit the review topics in pull-based development model
and conduct qualitative and quantitative analysis
based on a large-scale dataset of review comments.
}
In this paper, we first conducted a qualitative study on three popular
open-source software projects hosted on GitHub
and constructed a fine-grained taxonomy covering 11 categories
(\eg defect detecting, reviewer assigning, contribution encouraging)
for the review comments generated in the discussions.
{\color{red}
Second, we did preliminary quantitative analysis on a large set of review comments
that are labeled by \TSHC, a two-stage hybrid classification algorithm
which is able to automatically classify review comments
by combining rule-based and machine-learning (ML) techniques.
}
The main contributions of this study include the following:
\begin{itemize}
\item A fine-grained and multilevel taxonomy for review comments
in the pull-based development model is provided
in relation to technical, management, and social aspects.
\item A high-quality manually labeled dataset of review comments,
which contains more than 5,600 items, can be accessed via
a web page~\footnote{\url{https://www.trustie.net/projects/2455}}
and used in further studies.
\item A high-performance automatic classification model for
review comments of pull-requests is proposed.
The model leads to a significant improvement in terms of the weighted average F-measure,
\ie 9.2\% in Rails, 5.3\% in Elasticsearch, and 7.2\% in Angular.js,
compared with the text-based method.
\item Typical review patterns are explored which may provide implications
for reviewers and collaborative development platforms
to better organize the code review process.
\end{itemize}
The rest of this paper is organized as follows.
Section~\ref{sec:bg} provides the research background
and introduces the research questions.
In Section~\ref{sec:approach}, we elaborate the approach of our study.
Section~\ref{sec:result} lists the research result,
while Section~\ref{sec:Threats} discusses several threats to validity of our study.
Section~\ref{sec:Concl} concludes this paper and gives an outlook to future work.
% Several studies have been conducted to explore how the code review process influences
% pull-request acceptance~\cite{Tsay:2014b,Yu:2016} and latency~\cite{yu2015wait},
% and software release quality~\cite{t2015cq}.
% They found that code review, as a well-established software quality practice,
% is one of the most significant stages in pull-based development.

View File

@ -0,0 +1,227 @@
\section{Background and Related Work}
\label{sec:bg}
\subsection{Pull-based development model}
In GitHub, a growing number of developers contribute to the open source projects
by using the pull-request mechanism~\cite{Gousios:2014,Yu:2014}.
As illustrated in Figure~\ref{fig:github_wf},
a typical contribution process based on pull-based development model in GitHub
involves the following steps.
\emph{Fork:}
A contributor can find an interesting project
by following several well-known developers and watching their projects.
Before contributing, the contributor has to fork the original project.
\emph{Edit:}
After forking, the contributor can edit locally
without disturbing the main branch in original repository.
He is free to do whatever he wants,
such as implementing a new feature
or fixing bugs according to the cloned repository.
\emph{Pull Request:}
When his work is finished,
the contributor submits the changed codes from the forked repository
to its source by a pull-request.
Except for commits, the submitter needs to provide a title and description
to elaborate on the objective of the pull-request.
\emph{Test:}
Several reviewers play the role of testers
to ensure that the pull-request does not break the current runnable state.
They check the submitted changes by manually running the patches locally
or through an automated manner with the help of continuous integration (CI) services.
\emph{Review:}
All developers in the community have the chance
to discuss that pull-request in the issue tracker,
with the existence of pull-request description, changed files, and test result.
After receiving the feedback from reviewers,
the contributor updates his pull-request by attaching new commits
for another round review.
\emph{Decide: }
A responsible manager of the core team considers all the opinions of reviewers
and merges or rejects the pull-request.
\begin{figure}[tbp]
\centering
\includegraphics[width=8cm]{resources/Fig-1.pdf}
\caption{Pull-based work-flow on GitHub}
\label{fig:github_wf}
\end{figure}
Although research on pull-requests is in its early stage,
several relevant studies have been conducted
in terms of exploratory analysis, priority determination,
reviewer recommendation, \etc.
%Pull-request determination
Gousios \etal~\cite{Gousios:2014,Rigby:2014} conducted a statistical analysis
of millions of pull-requests from GitHub and analyzed the popularity of pull-requests,
the factors affecting the decision to merge or reject a pull-request,
and the time to merge a pull-request.
Their research result implies that
there is no difference between core members and outside contributors
in getting their pull-requests accepted and
only 13\% of the pull-requests are rejected for technical reasons.
Tsay \etal~\cite{Tsay:2014b} examined how social
and technical information are used to evaluate pull-requests.
They found that reviewers take into account of both
the contributors' technical practices
and the social connection between contributors and reviewers.
Yu \etal~\cite{Vasilescu:B,Yu:2016} conducted a quantitative study,
and discovered that latency is a complex issue to explain adequately,
and CI is a dominant factor for the process,
which even changes the effects of some traditional predictors.
Lima \etal~\cite{Lima,Yu:2015,Thongtanunam:2015,jiang:2015,rahman2016correct} proposed approaches to automatically
recommend potential pull-request reviewers,
which applied Random Forest algorithm and Vector Sapace Model respectively.
Veen \etal~\cite{Veen:2015} presented PRioritizer,
a prototype pull-request prioritization tool,
which works to recognize the top pull-requests the core members should focus on.
\subsection{Code review}
Code review is employed by many software projects to
examine the changes made by others in source codes,
find potential defects,
and ensure software quality before they merge back~\cite{Baysal:2015,Mcintosh:***}.
Traditional code review,
which is also well known as code inspection proposed by Fagan~\cite{Fagan:1976},
has been performed since the 1970s~\cite{Ackerman:1989,Kollanus:2009,Bacchelli:2013}.
The inspection process consists of well-defined steps
which are executed one-by-one in group meeting\cite{Fagan:1976}.
Many evidence have proved the value of code inspection
in software development\cite{Frank:1984,Ackerman:1989,Russell:1991,Aurum:2002}.
However, its cumbersome and synchronous characteristics have hampered
its universal application in practice~\cite{Bacchelli:2013}.
On the other hand,
with the evolution of software development model,
this approach is also usually misapplied and
result in bad outcomes according to Riby's exploration\cite{Rigby:**}.
With the occurrence and development of version control systems
and collaboration tools,
Modern Code Review (MCR)~\cite{Rigby:2011} is adopted by many software companies and teams.
Different from formal code inspections,
MCR is a lightweight mechanism that is less time-consuming and supported by various tools.
Rigby \etal~\cite{Rigby:2006,Rigby:2008,Rigby:2011} explored MCR process
in open source communities.
They performed case studies on GCC, Linux, Mozilla and Apache
and found several code review patterns and
a broadcast-based code review style used by Apache.
Basum \etal~\cite{Baum:2016} and Bacchelli \etal~\cite{Bacchelli:2013} analyzed MCR practice
in commercial software development teams inorder to
improve the use of code reviews in industry.
While the main motivation for code review was believed
to be finding defects to control software quality,
recent research~\cite{Bacchelli:2013} has revealed
additional expectations,
including knowledge transfer, increased team awareness,
and creation of alternative solutions to problems.
Moreover, participation rate in code review reported in the study
conducted by Ciolkowski \etal~\cite{Ciolkowski:2003}
shown that the figure has increased in theses years.
Other studies~\cite{Baysal:2015,Rigby:2011,Baysal:2013,Baum:2016,Mcintosh:2014} also
investigated factors that influence the outcomes of code review process.
In recent years, collaboration tools have evolved with social media~\cite{Storey2014The,Zhu2016Effectiveness}.
Especially, GitHub integrates the function of code review in the pull-based model
and makes it more transparent and social~\cite{Storey2014The}.
The evaluation of pull-request in GitHub is a form of MCR~\cite{Beller:2014,Bacchelli:2013}.
The way to give voice on pull-request evaluation is to
participate in the discussion and leave comments.
Tsay \etal~\cite{Tsay:2014a} analyzed highly discussed pull-request
which have extended discussions and explored evaluating code contributions through discussion in GitHub.
Georgios \etal~\cite{Gousios:2014b} investigated the challenges faced by the integrators in GitHub
by conducting a survey which involves 749 integrators.
Two kinds of comments can be identified by their position in the discussion:
issue comments (\ie general comments) as a whole about the overall contribution
and inline comments for specific lines of code in the changes ~\cite{Yu:2015,Thongtanunam:2016}.
Actually, there is another kind of review comments in GitHub: commit comments
which are commented on the commits of pull-requests~\cite{zhang2017social}.
Among these three kinds of comments,
general comments and inline comments account for the vast majority of all the comments
and the number of commit comments is extremely small.
Consequently, we only focus on general comments and inline comments
and do not take the commit comments into consideration.
\begin{figure}[ht]
% https://github.com/rails/rails/pull/12150
\centering
\includegraphics[width=8cm]{resources/Fig-2.pdf}
\caption{Example comments on GitHub}
\label{fig:comment}
\end{figure}
% \emph{Visualization:}
As can be seen from Figure~\ref{fig:comment},
all the review comments corresponding to a pull-request
are displayed and ordered primarily by creation time.
Issue comments are directly visible,
while inline comments default to be folded and is shown when the toggle button is clicked.
% \emph{Participant:}
Due to the transparent environment,
a large number of external developers,
who are concerned about the development of the corresponding project,
are allowed to participate in the evaluation of any pull-request
of a public repository.
All the participants can ask the submitter to bring out the pull-request more clearly
or improve the solution.
Prior study~\cite{Gousios:2014} also reveals that, in most projects,
more than half of the participants are external developers,
while the majority of the comments come from core team members.
The diversity of participants and their different concerns
in pull-request evaluation
produce various types of review comments
which cover a wide range of motivations
from solution detail to contribution appropriateness.
\subsection{Research Questions}
In this paper, we focus on analyzing the review comments in GitHub.
Although prior work~\cite{Bacchelli:2013,Gousios:2014b,Tsay:2014a}
{\color{red}
has identified several motivations and issues raised in code review,
we believe that the review topics involved in pull-based development model
are not yet well understood,
especially the underlying taxonomy of review topics has not been identified
}
Consequently, Our first question is:
\textbf{RQ1.} What is the taxonomy for review comments on pull-requests?
{\color{red}
Furthermore, we would like to quantitatively analyze review topics
and obtain more solid knowledge.
Therefore, we need a large scale of labeled review comments to support the quantitative analysis
which leads to the second question:
}
\textbf{RQ2.} Is it possible to automatically classify review comments
according to the defined taxonomy?
{\color{red}
With a large set of labeled comments being available,
we try to conduct quantitative analysis and explore the typical review patterns
such as }
what the most comments are talking about
and how contributors raise various kinds of issues in pull-requests.
Therefore, our last research question is:
\textbf{RQ3.} What are the typical review patterns in pull-based model?

View File

@ -0,0 +1,418 @@
\section{Approach}
\label{sec:approach}
\subsection{Approach Overview}
% \hl{The goals of our work are to }
% build a taxonomy for review comments in the pull-based development model
% and automate the comment classification according to the defined taxonomy.
% % !!!!!!!!!!!!!!!!!改成motivation
% We aim to provide the foundation for further study on
% the optimization of socialized code review process.
The goal of our work is to
investigate the code review practice in GitHub.
we aim to
{\color{red}
qualitatively and quantitatively analyze review topics in pull-based development model.
}
As illustrated in Figure \ref{fig:approach},
our research approach consists of the following steps.
\begin{figure}[ht]
\centering
\includegraphics[width=8cm]{resources/Fig-3.pdf}
\caption{The overview of our approach}
\label{fig:approach}
\end{figure}
\begin{enumerate}[1)]
\item \textit{Data collection}:
In our prior work,
we collected 4,896 projects that have the most number of pull-requests on Github.
This dataset is crawled through the official API offered by GitHub
and updated in the current study.
In addition, we also use the data released by
GHTorrent\footnote{http://ghtorrent.org}.
\item \textit{Taxonomy definition}:
We constructed a fine-grained multilevel taxonomy for the review comments
in pull-based development model.
\item \textit{Automatic classification}:
We proposed the TSHC algorithm,
which automatically classifies review comments
using rule-based and ML techniques.
\item \textit{Quantitative analysis}:
We did a preliminary quantitative analysis on a large set of
review comments which were labeled by TSHC algorithm.
\end{enumerate}
\subsection{Dataset}
\begin{table*}[ht]
\scriptsize
\centering
\caption{Dataset of our experiments}
\begin{tabular}{r c c r c c c c c}
% \toprule
\hline
\rowcolor[HTML]{000000}
{\color[HTML]{FFFFFF}\textbf{Projects}} &
{\color[HTML]{FFFFFF}\textbf{Language}} &
{\color[HTML]{FFFFFF}\textbf{Application Area}} &
{\color[HTML]{FFFFFF}\textbf{Hosted\_at}} &
{\color[HTML]{FFFFFF}\textbf{\#Star}} &
{\color[HTML]{FFFFFF}\textbf{\#Fork}} &
{\color[HTML]{FFFFFF}\textbf{\#Contr}} &
{\color[HTML]{FFFFFF}\textbf{\#PR}} &
{\color[HTML]{FFFFFF}\textbf{\#Comnt}} \\
% \midrule
\hline
\textbf{Rails} & Ruby & Web Framework & May. 20 2009 & 33906 & 13789 & 3194 & 14648 & 75102 \\
\textbf{Elasticsearch } & Java & Search Server & Feb. 8 2010 & 20008 & 6871 & 753 &6315 & 38930 \\
\textbf{Angular.js} & JavaScript & Front-end Framework & Jan. 6 2010 & 54231 & 26930 & 1557 & 6376 & 33335 \\
\bottomrule
\multicolumn{9}{l}{\emph{Abbreviations}: Contr: Contributor,
PR: Pull-request, Comnt: Comment}
\end{tabular}
\label{tab:dataset}
\end{table*}
The experiments in the paper is conducted on
three representative open source projects,
namely Rails, Elasticsearch and Angular.js,
which made heavy use of pull-requests in GitHub.
The dataset is composed of two sources:
{\color{red}GHTorrent MySQL dump} released in Jun. 2016
and our own crawled data from GitHub.
From GHTorrent, we can get a list of projects together with their basic information
such as programming language, hosting time, the number of forks, and the list of pull-requests.
{\color{red}
For our own dataset, we have crawled the text content of
pull-requests (\ie title, description) and review comments
according to the urls provided by GHTorrent.
}
Finally, the two sources are linked by the id of projects and pull-request number.
Table \ref{tab:dataset} lists the key statistics of the dataset.
We studied on original projects (\ie not forked from others)
written in Ruby, Java and JavaScript,
which are three of the most popular languages\footnote{https://github.com/blog/2047-language-trends-on-github} on GitHub.
The three selected projects are hosted in GitHub at an early time,
and are applied in different areas;
Rails is used to build websites,
Elasticsearch acts as a search server
and Angular.js is an outstanding front-end development framework.
This ensures the diversity of experimental projects,
which is needed to increase the generalizability of our work.
The number of stars, the number of forks and the number of contributors
indicate the hotness of a project.
Starring is similar to the function of a bookmarking system
which informs users of the latest activities of projects that they have starred.
In total, our dataset contains 27,339 pull-requests and 147,367 review comments.
\subsection{Taxonomy Definition}
\label{td}
Previous work has studied the challenges faced by pull-request reviewers
and the issues introduced by pull-request submitters~\cite{Gousios:2014b,Tsay:2014a}.
Inspired by their work, we decide to comprehensively observe
the topics of code reviews in depth
rather than merely focusing on technical and nontechnical perspectives.
% !!!!!!!!!! card sort
We conducted a card sort~\cite{Bacchelli:2013}
to determine the taxonomy schema,
which is executed manually through an iterative process
of reading and analyzing review comments randomly collected from the three projects.
The following steps are to executed to define the taxonomy.
\begin{enumerate}[1)]
\item First, we randomly selected 900 review comments (300 for each project).
\item Two participants independently conducted card sorts on the 70\% of the comments.
They firstly labeled each selected comment with a descriptive message.
The labeled comments are then divided into different groups
according to their descriptive messages.
Through a rigorous analysis of existing literature
and their own experience with working and analyzing
the pull-based model in the last two years,
they identified their respective draft taxonomies.
\item They then met and discussed their identified draft taxonomies.
When they agreed with each other, the initial taxonomy was constructed.
\item Another 10 participants were invited to help varify and improve the taxonomy
by examining another 10\% of the comments.
After some refinement of category definition and adjustment of taxonomy hierarchy was executed,
the final taxonomy was determined.
\item Finally, the two authors independently classified the remaining 20\% of the comments into the taxonomy.
We used one of the most popular reliability measurement, that is percent agreement, to calculate their reliability.
We found they agreed on the 97.8\% (176 / 180) of the labeling of comments.
\end{enumerate}
In the above process, the two main participants (\ie the two authors of the paper)
have five years' and eight years' experience in software development respectively.
Both of them are interested in academic research in empirical software engineering and social coding.
Especially, the second author has four years' research experience in pull-based development model.
The another 10 participants also work at our research team
including postgraduates, PH. D. candidates and project developers.
\subsection{Automatic Classification}
{\color{red}
After the taxonomy of review topics is defined,
we would like to conduct quantitative study on a large scale of labeled review comments
which are expected to be classified by an automatic approach.
First of all, we need to manually label a set of comments to train the automatic classification model.
And then, we use the classification model to automatically label a larger dataset of comments.
}
\subsubsection{Manual labeling}
For each project, we randomly sampled 200 pull-requests
( of which the comment count is greater than 0)
per year from 2013 to 2015.
Overall, 1,800 distinct pull-requests and 5,645 review comments are sampled.
According to the defined taxonomy,
we manually classified the sampled review comments.
We built an online labeling platform (OLP) which
was deployed on the public network and offered a web-based interface.
OLP is helpful to reduce the extra burden on the labeling executor
and ensure the quality of the manual labeling results.
\begin{figure}[ht]
\centering
\includegraphics[width=8cm]{resources/Fig-4.pdf}
\caption{The main interface of OLP}
\label{fig:olp}
\end{figure}
As shown in Figure~\ref{fig:olp},
the main interface of OLP contains three sections
which presents a pull-request together with its review comments,
\emph{Section 1}:
The title, description, and submitter's user name of a pull-request
are displayed in this section.
The hyper-link of the pull-request on GitHub is also shown to
make it more convenient to jump to the original web page
for a more detailed view and inspection if necessary.
\emph{Section 2}:
All the review comments of a pull-request are listed in this section
and ordered by creation time.
The user name,
comment type (inline comment or issue comment),
and creation time of a review comment appear on the top of comment content.
\emph{Section 3}:
Beside a comment,
there are candidate labels corresponding to our taxonomy,
which can be selected by clicking the check-box next to the label.
The use of check-boxes means that multiple labels can be assigned to a comment.
Other labels that are not in our list can also be reported
by writing free text in the text field named `other'.
\subsubsection{Two-stage hybrid classification}
Figure~\ref{fig:TSHC} illustrates the automatic classification model \TSHC (Two-Stage Hybrid Classification).
TSHC consists of two stages that utilize comments text
and other information extracted from comments and pull-requests respectively.
\begin{figure*}[ht]
\centering
\includegraphics[width=18cm]{resources/Fig-5.pdf}
\caption{Overview of \TSHC}
\label{fig:TSHC}
\end{figure*}
\underline{\textbf{Stage One:} }
The classification in this stage mainly utilizes the text part of each review comment
and produces a vector of possibility (VP),
in which each item is the possibility of whether
a review comment will be labeled with the corresponding category.
Preprocessing is necessary before formal comment classification~\cite{Antoniol:2008}.
Reviewers tend to reference the source code, hyper-link, or statement of others
in a review comment to clearly express their opinion, prove their point, or reply to other people.
These behaviors promote the review process
but cause a great challenge to comment classification.
Words in these reference texts contribute minimally to the classification
and even introduce interference.
Hence, we transform them into single-word indicators to
reduce vocabulary interference
and reserve reference information.
Specifically, the source code, hyperlink, and statement of others are replaced with
\textit{`cmmcode'}, \textit{`cmmlink'} and \textit{`cmmtalk'} respectively.
After preprocessing, a review comment is classified by a rule-based technique,
which uses inspection rules to match the comment text for each category.
Several phrases often appear in the comments of a specific category.
For instance, \textit{``lgtm''} (abbreviation of \textit{``looks good to me''})
is usually used by reviewers to express their satisfaction with a pull-request.
This type of phrases are discriminating and helpful
in recognizing the category of a review comment.
Therefore, we establish inspection rules for each category,
which are a set of regular expressions abstracted from discriminating phrases.
A category label is assigned to a review comment,
and the corresponding item in VP is set to 1
if one of its inspection rules matches the comment text.
In our method each category gets 7.5 rules in average,
therefore the following only shows part of the inspection rules
and the corresponding matched comments.
\begin{itemize}
\item From \textit{Style Checking (L2-1)}
\begin{itemize}
\item \footnotesize{\texttt{(blank$|$extra) (line$|$space)s?}}
\item \textit{``Please add a new \textbf{blank line} after the include''}
\end{itemize}
\item From \textit{Value Affirming (L2-4)}
\begin{itemize}
% \item \small{\texttt{(looks$|$seem)s? (good$|$great$|$useful$|$awesome$|$fine)}}
\item \footnotesize{\texttt{(looks$|$seem)s? (good$|$great$|$useful)}}
\item \textit{``\textbf{Looks good} to me, the grammar is definitely better.''}
\end{itemize}
\item From \textit{Reviewer Assigning (L2-8)}
\begin{itemize}
\item \footnotesize{\texttt{(cc:?$|$wdyt$|$defer to$|\backslash$br$\backslash$?) (@$\backslash$w+ *)+}}
\item \textit{``\textbf{/cc @fxn} can you take a look please?''}
\end{itemize}
\item From \textit{Politely Responding (L2-10)}
\begin{itemize}
\item \footnotesize{\texttt{thanks?|thxs?|:($\backslash$w+)?heart:}}
% \item $thanks?|thxs?|:(\backslash w+)?heart:$
\item \textit{``@xxx looks good. \textbf{Thank} you for your contribution \textbf{:yellow\_heart:}''}
\end{itemize}
\end{itemize}
The review comment is then processed by the ML-based technique.
ML-based classification is performed with scikit-learn\footnote{http://scikit-learn.org/},
particularly the support vector machine (SVM) algorithm.
The comment text is tokenized and stemmed to a root form~\cite{Porter:1997}.
We filter out punctuations from word tokens and reserve English stop words
because we assume that common words play an important role in short texts, such as review comments.
We adopt the TF-IDF (term frequency-inverse document frequency) model~\cite{Baeza:2004}
to extract a set of features from the comment text
and apply the ML algorithm to text classification.
A single review comment often addresses multiple topics.
Hence, one of the goals of \TSHC is to perform multi-label classification.
To this end, we construct text classifiers (TCs) for each category
with a one-versus-all strategy.
For a review comment
which has been matched by inspection rule $R_{i}$
(supposing that n categories of $C_{1},C_{2}, \dots, C_{n}$ exist),
each TC ($TC_{1},TC_{2}, \dots, TC_{n}$), except for $TC_{i}$,
is applied and predict the possibility of this review comment belonging to the corresponding category
Finally, the VP determined by inspection rules and text classifiers are passed on to \textit{stage~two}.
\underline{\textbf{Stage Two:}}
The classification in this stage is performed on composed features.
Review comments are usually short texts.
Our statistics indicates that the minimum, maximum, and average numbers of words
contained in a review comment
are 1, 1527, and 32, respectively.
The text information contained in a review comment is limited to be used to determine its category.
Therefore, in addition to the VP generated in \emph{stage one},
we also consider the following other features related to review comments.
\texttt{\small{Comment\_length:}}
% \textit{Comment\_length:}
This feature refers to the total number of characters
contained in a comment text after preprocessing.
Long comments are likely to argue about pull-request appropriateness and code correctness.
\texttt{\small{Comment\_type:}}
% \textit{Comment\_type:}
This binary feature indicates whether a review comment is inline comment or issue comment.
An inline comment tends to talk about the solution detail,
whereas an issue comment is likely to talk about other ``high-level'' issues,
such as pull-request decision and project management.
\texttt{\small{Core\_team:}}
% \textit{Core\_team:}
This binary feature refers to whether the comment author
is a core member of a project or an external contributor.
Core members are more likely to pay attention to
pull-request appropriateness and project management.
\texttt{\small{Link\_inclusion:}}
% \textit{Link\_inclusion:}
This binary feature identifies if a comment includes hyper-links.
Hyper-links are usually used to provide evidence
when someone insists on a point of view or
to offer guidelines when someone wants to help other people.
\texttt{\small{Ping\_inclusion:}}
% \textit{Ping\_inclusion:}
This binary feature refers to if a comment includes ping activity
(occurring by the form of ``@ user-name'').
\texttt{\small{Code\_inclusion:}}
% \textit{Code\_inclusion:}
This binary feature denotes if a comment includes the code elements.
Comments related to the solution detail tend to contain code elements.
However, this assumption does not limit whether the code is a replica of the committed code
or a new code snippet wrote by the reviewer
\texttt{\small{Ref\_inclusion:}}
% \textit{Ref\_inclusion:}
This binary feature indicates if a comment includes a reference on the statement of others.
Such a reference indicates a reply to someone,
which probably reflects further suggestion to or disagreement with a person.
\texttt{\small{Sim\_pr\_title:}}
% \textit{sim\_pr\_title:}
This feature refers to the similarity between the text of the comment
and the text of the pull-request title
(measured by the number of common words divided by the number of union words).
\texttt{\small{Sim\_pr\_desc:}}
% \textit{Sim\_pr\_desc:}
This binary feature denotes the similarity between the text of the comment
and the text of the pull request description
(measured similarly as how \texttt{\small{sim\_pr\_title}} is computed).
Comments with high similarity to the title or description of a pull-request
are likely to discuss the solution detail or the value of the pull request.
Together with the VP passed from \emph{stage one},
these features are composed to form a new feature vector to be processed by prediction models.
Similar to \emph{stage one}, \emph{stage two} provides binary prediction models for each category.
In the prediction models, a new VP is generated
to represent how likely a comment will fall into a specific category.
After iterating the VP, a comment is labeled with class $C_{i}$
if the $i_{th}$ vector item is greater than 0.5.
If all the items of the VP are less than 0.5,
the class label corresponding to the largest possibility will be assigned to the comment.
Finally, each comment processed by \TSHC is marked with at least one class label.
\subsection{Quantitative Analysis}
To identify typical review patterns we used our trained hybrid classification model
to classify a large scale of review comments and then examined the review patterns
based on this automatically classified dataset.
In total, we classified 147,367 comments of 27,339 pull-requests.
Based on this dataset, we conducted a preliminary quantitative study.
We first explored the distribution of review comments on each category
and then we reported some of the findings derived from the distribution.

View File

@ -0,0 +1,745 @@
\section{Results}
\label{sec:result}
\subsection{RQ1: What is the taxonomy for review comments on pull-requests?}
\begin{table*}[ht]
\scriptsize
\caption{Complete taxonomy}
\begin{tabular}{r r p{9.8cm}}
% \toprule
\hline
\rowcolor[HTML]{000000}
&&\\
\rowcolor[HTML]{000000}
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-1 Categories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-2 Subcategories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Description \& Example}}}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{Code Improvement}\\(L1-1)}}
&\cellcolor{lightgray} &\cellcolor{lightgray}Points out extra blank line, improper indention, inconsistent naming convention, etc. \\
&\cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Style Checking\\(L2-1)}} &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
&\multirow{3}*{\tabincell{r}{Defect Detecting\\(L2-2)}} & Figures out runtime program errors or evolvability defects and etc.\\
& &e.g., \emph{``''he default should be `false`''} and \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Demands submitter to provide test case for changed codes, report test result, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Code Testing\\(L2-3)}} &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{PR Decision-making}\\(L1-2)}}
&\multirow{2}*{\tabincell{r}{Value Affirming\\(L2-4)}} &Satisfied with the pull-request and agree to merge it\\
& &e.g., \emph{``PR looks good to me. Can you \dots''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Rejects to merge the pull-request for duplicate proposal, undesired feature, etc.\\
& \cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Solution Disagreeing\\(L2-5)}} & \cellcolor{lightgray} e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
&\multirow{3}*{\tabincell{r}{Further Questioning\\(L2-6)}} & Confused with the purpose of the pull-request and ask for more details or use cases\\
& &e.g., \emph{``Can you provide a use case for this change?''}\\
%%%%%%
% \midrule
\hline
\multirow{6}*{\tabincell{r}{\textbf{Project Management}\\(L1-3)}}
&\cellcolor{lightgray} &\cellcolor{lightgray} States what type of changes a specific version is expected to merge, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Roadmap Managing\\(L2-7)}} &\cellcolor{lightgray}e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
&\multirow{2}*{\tabincell{r}{Reviewer Assigning\\(L2-8)}} &Ping other one(s) to review this pull-request\\
& &e.g., \emph{``/cc @fxn can you take a look please?''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Needs to squash or rebase the commits, formulate the message, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Convention Checking\\(L2-9)}} &\cellcolor{lightgray}e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
%%%%%%
% \midrule
\hline
\multirow{4}*{\tabincell{r}{\textbf{Social Interaction}\\(L1-4)}}
&\multirow{3}*{\tabincell{r}{Politely Responding\\(L2-10)}} & Thanks for what other people do, apologize for mistakes, etc.\\
& &e.g.,\emph{``Thank you. This feature was already proposed and it was rejected. See \#xxx''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray} Agrees with others' opinion, compliments others' work, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Contribution Encouraging\\(L2-11)}} &\cellcolor{lightgray}e.g.,\emph{``:+1: nice one @cristianbica.''}\\
%%%%%%
% \midrule
\hline
\textbf{Others} &\multicolumn{2}{l}{Short sentence without clear or exact meaning like \textit{``@xxxx will do''} and \textit{``The same here :)''}} \\
\bottomrule
\multicolumn{3}{l}{\emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji in GitHub}}
\end{tabular}
\label{tab:taxonomy}
\end{table*}
Table \ref{tab:taxonomy} shows the complete taxonomy.
We identified four Level-1 categories,
namely \textit{Code Correctness (L1-1)},
\textit{PR Decision-making (L1-2)},
\textit{Project Management (L1-3)},
and \textit{Social Interaction (L1-4)},
each of which contains more specific Level-2 categories.
For each Level-2 category, we present its description
together with an example comment.
We show short comments or only the key part of long comments
to provide a brief display.
{\color{red}
Compared with previous studies~\cite{Bacchelli:2013,Tsay:2014a},
our hierarchical taxonomy presents a more systematic and elaborate schema
of review topics in pull-based model.
Moreover, we succeeded in identifying a group of review comments relating to project management
(\eg \textit{Roadmap Managing (L2-7), Reviewer Assigning (L2-8)})
and found them proper positions in the final taxonomy.
}
Also, we would like to point out it is a common phenomenon
that a single comment usually covers multiple topics.
The following review comments are provided as examples.
\emph{C1: ``Thanks for your contribution! This looks good to me
but maybe you could split your patch in two different commits? One for each issue.''}
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
In the first comment \textit{C1},
the reviewer thanks the contributor (\textit{L2-10})
and shows his satisfaction with this pull request (\textit{L2-4}),
followed by a request of commit splitting (\textit{L2-9}).
In \textit{C2},
the reviewer expresses that he agrees with this change (\textit{L2-4}),
thinks highly of this work
(\textit{L2-11}, \textit{:+1:} is markdown grammar for emoji ``thumb-up''),
and assigns other reviewers to ask for more advice (\textit{L2-11}).
With regard to reviewer assignment, reviewers tend to delegate a code review
if they are unfamiliar with the change under review.
In addition, we have collected 13 exception comments
which are labled with \textit{other} option.
These comments mainly fall into the following two groups:
\begin{itemize}
\item \textit{Platform-related (2)}: this kind of comments are related to the platform, i.e., GitHub.
Sometimes, developers may be not familiar with the features of GitHub:
``I don't understand why Github display all my commits''
\item \textit{Simple reply (11)}: this kind of comments have only a very few words
and have not exact significant meaning: ``@** no.'', ``yes'' and ``done''.
\end{itemize}
On one hand, the number of these comments is relatively small and
on the other hand these comments do not provide too much contribution on the code review process.
Moreover, the process of taxonomy definition in Section~\ref{td}
has made our taxonomy relatively complete and robust for
its large-scale sampling, thorough discussion and multiple validation.
Therefore, we exclude them from our analysis and do not adjust our taxonomy.
\noindent
\textbf{In Summary.}
From the qualitative study, we identified a two-level taxonomy for
review comments which consists of 4 categories in Level 1
and 11 sub-categories in Level 2.
% \begin{framed}
% \noindent
% \textbf{RQ1:} {}
% \textit{From the qualitative study, we identified a two-level taxonomy for
% review comments which consists of 4 categories in Level 1
% and 11 sub-categories in Level 2.}
% \end{framed}
\subsection{RQ2: Is it possible to automatically classify
review comments according to the defined taxonomy}
\begin{table*}[ht]
\scriptsize
\centering
\caption{Classification performance on Level-2 subcategories}
\begin{tabular}{l |l@{}l@{}l|l@{}l@{}l|l@{}l@{}l|l@{}l@{}l|l@{}l@{}l|l@{}l@{}l}
\hline
\rowcolor[HTML]{000000}
&
\multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Rails}}} &
\multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Elasticsearch}}} &
\multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Angular.js}}}\\
% \multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Ave.}}}\\
\rowcolor[HTML]{000000}
&
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TBC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TSHC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TBC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TSHC}}}&
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TBC}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{TSHC}}}\\
\rowcolor[HTML]{000000}
\multirow{-3}*{{\color[HTML]{FFFFFF}\textbf{Cat.}}}&
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. }&
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. } &
{\color[HTML]{FFFFFF}F-M } &
{\color[HTML]{FFFFFF}Prec. }&
{\color[HTML]{FFFFFF}Rec. }&
{\color[HTML]{FFFFFF}F-M }\\
\hline
\textbf{L2-1} &
0.75 & 0.57 & 0.66 &
0.88 & 0.67 & 0.78 &
0.75 & 0.46 & 0.61 &
0.85 & 0.58 & 0.72 &
0.54 & 0.26 & 0.40 &
0.76 & 0.78 & 0.77 \\ \hline
\textbf{L2-2} &
0.63 & 0.84 & 0.74 &
0.71 & 0.86 & 0.79 &
0.67 & 0.92 & 0.80 &
0.77 & 0.82 & 0.80 &
0.65 & 0.71 & 0.68 &
0.69 & 0.71 & 0.70 \\ \hline
\textbf{L2-3} &
0.59 & 0.46 & 0.53 &
0.74 & 0.78 & 0.76 &
0.66 & 0.55 & 0.61 &
0.60 & 0.52 & 0.56 &
0.64 & 0.63 & 0.64 &
0.75 & 0.77 & 0.76 \\ \hline
\textbf{L2-4} &
0.56 & 0.35 & 0.46 &
0.73 & 0.58 & 0.66 &
0.92 & 0.84 & 0.88 &
0.91 & 0.88 & 0.90 &
0.83 & 0.72 & 0.78 &
0.84 & 0.79 & 0.82 \\ \hline
\textbf{L2-5} &
0.50 & 0.26 & 0.38 &
0.63 & 0.43 & 0.53 &
0.65 & 0.36 & 0.51 &
0.80 & 0.67 & 0.74 &
0.63 & 0.48 & 0.56 &
0.61 & 0.51 & 0.56 \\ \hline
\textbf{L2-6} &
0.47 & 0.21 & 0.34 &
0.60 & 0.36 & 0.48 &
0.33 & 0.11 & 0.22 &
0.38 & 0.26 & 0.32 &
0.45 & 0.51 & 0.48 &
0.47 & 0.49 & 0.48 \\ \hline
\textbf{L2-7} &
0.79 & 0.66 & 0.73 &
0.79 & 0.72 & 0.76 &
0.75 & 0.39 & 0.57 &
0.75 & 0.68 & 0.72 &
0.49 & 0.31 & 0.40 &
0.83 & 0.64 & 0.74 \\ \hline
\textbf{L2-8} &
0.83 & 0.77 & 0.80 &
0.98 & 0.78 & 0.88 &
0.57 & 0.35 & 0.46 &
0.88 & 0.90 & 0.89 &
0.35 & 0.16 & 0.26 &
0.96 & 0.67 & 0.82 \\ \hline
\textbf{L2-9} &
0.83 & 0.76 & 0.80 &
0.90 & 0.81 & 0.86 &
0.94 & 0.57 & 0.76 &
0.96 & 0.76 & 0.86 &
0.85 & 0.76 & 0.81 &
0.83 & 0.82 & 0.83 \\ \hline
\textbf{L2-10} &
0.98 & 0.93 & 0.96 &
0.99 & 0.98 & 0.99 &
0.91 & 0.84 & 0.88 &
0.93 & 0.95 & 0.94 &
0.90 & 0.80 & 0.85 &
0.94 & 0.93 & 0.94 \\ \hline
\textbf{L2-11} &
0.80 & 0.66 & 0.73 &
0.89 & 0.87 & 0.88 &
0.87 & 0.67 & 0.77 &
0.92 & 0.91 & 0.92 &
0.93 & 0.62 & 0.78 &
0.98 & 0.92 & 0.95 \\ \hline
\textbf{AVG} &
0.71 & 0.67 & \textbf{0.69} &
0.80 & 0.76 & \textbf{0.78} &
0.79 & 0.75 & \textbf{0.77} &
0.83 & 0.81 & \textbf{0.82} &
0.71 & 0.64 & \textbf{0.67} &
0.76 & 0.73 & \textbf{0.75} \\
\bottomrule
\end{tabular}
\label{tab:L2}
\end{table*}
In the evaluation, we design a text-based classifier (TBC) as a comparison baseline
which only uses the text content of review comments and
does not apply any inspection rule or composed feature.
TBC uses the same preprocessing techniques and SVM models as used in \TSHC.
Classification performance is evaluated through a 10-fold cross validation,
namely splitting review comments into 10 sets,
of which nine sets are used to train the classifiers
and the remaining set is for the performance test.
The process is repeated 10 times.
Moreover, the evaluation metrics used in the paper are precision, recall and
F-measure which are computed by the following formulas respectively.
\begin{small}
\begin{equation}
Prec(L2\textrm{-}i) = \frac{N_{CC}(L2\textrm{-}i)}{N_{TSHC}(L2\textrm{-}i)}
\label{equ:pre}
\end{equation}
\begin{equation}
Rec(L2\textrm{-}i) = \frac{N_{CC}(L2\textrm{-}i)}{N_{total}(L2\textrm{-}i)}
\label{equ:rec}
\end{equation}
\begin{equation}
F\textrm{-}M(L2\textrm{-}i) =
\frac{2*Prec(L2\textrm{-}i)*Rec(L2\textrm{-}i)}{Prec(L2\textrm{-}i)+Rec(L2\textrm{-}i)}
\label{equ:fm}
\end{equation}
\end{small}
In the formulas,
$N_{total}(L2-i)$is the total number of comments of category \textit{L2-i} in the test dataset,
$N_{TSHC}(L2-i)$ is the number of comments that are classfied as \textit{L2-i} by \TSHC,
and $N_{CC}(L2-i)$is the number of comments that have been correctly classified as \textit{L2-i}.
Table~\ref{tab:L2} shows the precision, recall, and F-measure
provided by different approaches for Level-2 categories.
Our approach achieves the highest precision, recall, and F-measure scores
in all categories with only a few exceptions.
To provide an overall performance evaluation,
we use the weighted average value of F-measure~\cite{Zhou:2014} of all categories
by the proportions of instances in that category.
Equation~\ref{equ:wav} describes the formula to derive the average F-measure.
In the equation, the average F-measure is denoted as $F_{avg}$,
the F-measure of category \textit{L2-i} as $f_{i}$,
and the number of instances of category \textit{L2-i} as $n_{i}$.
\begin{equation}
F_{avg} = \frac{\sum_{i=1}^{11}n_{i} * f_{i}}{\sum_{i=1}^{11}n_{i}}
\label{equ:wav}
\end{equation}
\begin{table*}[ht]
\scriptsize
\centering
\caption{Classification performance of different methods}
\begin{tabular}{l |c c c| c c c| c c c}
\hline
\rowcolor[HTML]{000000}
&
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{Rails}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{Elasticsearch}}} &
\multicolumn{3}{c}{{\color[HTML]{FFFFFF}\textbf{Angular.js}}}\\
% \multicolumn{6}{c}{{\color[HTML]{FFFFFF}\textbf{Ave.}}}\\
\rowcolor[HTML]{000000}
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Methods}}}&
{\color[HTML]{FFFFFF}TBC }&
{\color[HTML]{FFFFFF}TBC\_R } &
{\color[HTML]{FFFFFF}TBC\_CF } &
{\color[HTML]{FFFFFF}TBC }&
{\color[HTML]{FFFFFF}TBC\_R } &
{\color[HTML]{FFFFFF}TBC\_CF } &
{\color[HTML]{FFFFFF}TBC }&
{\color[HTML]{FFFFFF}TBC\_R } &
{\color[HTML]{FFFFFF}TBC\_CF } \\
\hline
$F_{avg}$ &
0.69 & 0.74 & 0.71 &
0.77 & 0.81 & 0.78 &
0.67 & 0.73 & 0.69 \\
Improvement (\%)&
- & 7.24 & 2.90 &
- & 5.19 & 1.30 &
- & 8.96 & 3.00 \\
\bottomrule
\end{tabular}
\label{tab:increase}
\end{table*}
The table indicates that our approach consistently outperforms
the baseline across the three projects.
Compared with that in the baseline,
the improvement in \TSHC running on each project
in terms of the weighted average F-measure is
9.2\% (from 0.688 to 0.780) in Rails, 5.3\% (from 0.767 to 0.820) in Elasticsearch,
and 7.2\% (from 0.675 to 0.747) in Angular.js.
% !!!!!!!!!!!!!!!!!!说个最高到达了多少
These results indicate that our approach is highly applicable in practice.
Furthermore, we would like to explore
whether the rule-based technique or composed features contribute more
to the increased performance of \TSHC compared with the baseline TBC.
Therefore, we design another two comparative settings.
\begin{itemize}
\item TBC\_R (TBC + rule): in this setting, we combine TBC with rule-based technique.
\item TBC\_CF (TBC + composed features): in this setting, we combine TBC with composed features in stage 2.
\end{itemize}
Table~\ref{tab:increase} presents the experiment result.
Overall speaking, TBC\_R is better than TBC\_CF
which means rule-based technique contributes more to the increased performance
than composed features in stage 2.
Compared with the TBC,
TBC\_R achieves an improvement of 7.24\%, 5.19\%, and 8.96\% of for Rails, ElasticSearch and Angular.JS respectively,
while TBC\_CF achieves 2.9\%, 1.30\% and 2.99\%.
We further study the review comments mis-categorized by \TSHC.
An example is that
\textit{``While karma does globally install with a bunch of plug-ins,
we do need the npm install because without that you don't get the karma-ng-scenario karma plug-in.''}.
\TSHC classifies it as belonging to \textit{Error Detecting (L2-2)},
but it is actually a \textit{Solution Disagreeing (L2-5)} comment.
The reason for this incorrect predication is twofold,
namely the lack of explicit discriminating terms
and the too specific expression for rejection.
Inspection rule of \textit{Solution Disagreeing (L2-5)} is unable to matched
because of the lack of corresponding matching patterns.
ML classifiers tend to categorize it into \textit{Error Detecting (L2-2)}
because the too specific expression of opinion makes it
more like a low-level comment about code correctness
instead of a high-level one about the pull-request decision.
We attempt to solve this problem by adding factors
(\eg \texttt{\small{Comment\_type}} and \texttt{\small{Code\_inclusion}}) in \textit{stage 2} of \TSHC,
which can help reveal whether a review comment is talking about
the pull-request as a whole or the solution detail.
Although the additional information improves the classification performance to some extent,
it is not sufficient to differentiate the two types of comments.
We plan to address the issue by extending the manually labeled dataset
and introducing a sentiment analysis.
\noindent
\textbf{In Summary.}
Compared with the baseline,
\TSHC achieves higher performance
in terms of the weighted average F-measure, namely
0.78 in Rails, 0.82 in Elasticsearch,
and 0.75 in Angular.js,
which indicates \TSHC is applicable
in practical automatic classification.
\subsection{RQ3: What are the typical review patterns among the reviewers' discussions?}
We now discuss some of our preliminary quantitative analysis.
We used \TSHC to automatically classify all the review comments,
based on which we explored the quantitative characteristics of review comments.
\subsubsection{Comments Distribution}
\begin{figure}[htbp]
\centering
\includegraphics[width=8cm]{resources/Fig-6.pdf}
\caption{The percentage of the Level-2 categories}
% \caption{The distribution of review comments on Level-2 categories}
\label{fig:stas_plot}
\end{figure}
Figure \ref{fig:stas_plot} shows the percentage of the Level-2 categories.
As a general view, the comments distribution over the three projects
represent similar patterns, in which most comments are about
\textit{Defect Detecting (L2-2)},
\textit{Value Affirming(L2-4)}
and \textit{Politely Responding (L2-10)}.
Significantly, \textit{Defect Detecting (L2-2)} occupies the first place
(Rails: 42.5\%, EalsticSearch: 50.6\%, Angular.js: 33.1\%).
This is consistent with previous studies\cite{Gousios:2014b,Bacchelli:2013}
which stated that the main purpose of code review is to find defects.
Moreover, the development of open source projects rely on
the voluntary collaborative contribution~\cite{Shah:2006} of thousands of developers.
A harmonious and friendly community environment is beneficial to
promote contributors' enthusiasm
which is critical to the continuous evolution of a project.
As a result, a majority of reviewers never hesitate
to react positively to others' contribution(L2-10, L2-11)
and express their satisfaction with high-quality pull-requests(L2-4).
Actually, we also explored comments distribution on top of manually labeled dataset
and found it is similar to that on the automatically classified dataset.
The distribution difference between the two dataset is that
manually labeled dataset has more comments of L2-4 and less comments of L2-2
compared with the automatically labeled dataset.
But the difference is slight and does not affect the distribution outline.
As a result, we only reported our findings on the automatically classified dataset
which has the larger data size and therefore makes our findings more convincing.
\subsubsection{Cost-free comments are less frequent }
It is strange that although inspecting code style (L2-1)
and testing code (L2-3) cost reviewers very few effort,
which do not require too much understanding for code changes,
comments of these two categories are relatively less than others.
This is somewhat inconsistent with the study
conducted by Bacchelli \etal~\cite{Bacchelli:2013} on Microsoft teams,
in which they found \textit{code style} (called \textit{code improvement} in their study)
is the most frequent outcome of code reviews.
The reason for the inconsistence, in our opinion,
is the difference of development environment.
In GitHub, contributors are in a transparent environment
and their activities are visible to everyone\cite{Tsay:2014b,gousios2016work}.
Hence, in order to build and maintain good reputations,
contributors usually go through their code by themselves before submission
and prevent making apparent mistakes~\cite{gousios2016work}.
\begin{table}[htbp]
\setlength\extrarowheight{-1.5pt}
\scriptsize
\centering
\caption{Examples of Error Detecting}
\begin{tabular}{p{8cm}}
%%%%%%%
\bottomrule
\rowcolor{gray}\textbf{Example 1} (inline comment)\\ \hline
\rowcolor{lightgray}
Changed codes: \\
$-$ \color{red} \texttt{assert r.save, ``First save should be successful''}\\
+ \color[HTML]{008B00} \texttt{assert r.valid?, ``First validation should be}\\
\quad \color[HTML]{008B00}successful''\\
+ \color[HTML]{008B00} \texttt{r.save} \\
\rowcolor{lightgray}
Review comments: \\
\textit{Just this change is not necessary, it'll run validations twice,
you canuse r.save in the assert call} \\
%%%%%%%%
%%%%%%%
\bottomrule
\rowcolor{gray}\textbf{Example 2} (inline comment) \\ \hline
\rowcolor{lightgray}
Changed codes: \\
$-$ \color{red} \texttt{``\#{schema\_search\_path}-\#{sql}''}\\
+ \color[HTML]{008B00} \texttt{``\#{schema\_search\_path}-\#{sql}''.to\_sym}\\
\rowcolor{lightgray}
Review comments: \\
\textit{This will introduce a memory leakage on system
where theare a lot of possible SQLs.
Remember symbols are not removed from memory
from the garbage collector } \\
%%%%%%%%
%%%%%%%
\bottomrule
\rowcolor{gray}\textbf{Example 3} (general comment)\\ \hline
\rowcolor{lightgray}
Pull-request description: \\
after\_remove callbacks only run if record was removed\\
\rowcolor{lightgray}
Review comments: \\
\textit{This will change the existing behavior and
people may be relying on it so I don't think it is safe
to change it like this. If we want to change this
we need a better upgrade path, like a global option
to let people to switch the behavior} \\
%%%%%%%%
%%%%%%%
\bottomrule
\rowcolor{gray}\textbf{Example 4} (inline comment) \\ \hline
\rowcolor{lightgray}
Changed codes: \\
+ \color[HTML]{008B00} \texttt{class TableList}\\
+ \color[HTML]{008B00} \texttt{\quad delegate :klass, :to $\Rightarrow$ :@li, :allow\_nil $\Rightarrow$ true}\\
+ \color[HTML]{008B00} \texttt{\quad def initialize(li)}\\
+ \color[HTML]{008B00} \texttt{\quad \quad @li = li}\\
+ \color[HTML]{008B00} \texttt{\quad end}\\
+ \color[HTML]{008B00} \texttt{end}\\
\rowcolor{lightgray}
Review comments: \\
\textit{This new class isn't needed -
we can reuse an existing one} \\
\bottomrule
%%%%%%%%
\end{tabular}
\label{Example:ed}
\end{table}
\subsubsection{Defects exist under green tests}
We were also curious about why a large percentage of comments were
focused on detecting defects (L2-2)
even though most of the pull-requests did not raise testing issues (L2-3).
After examining again those pull-requests
which received comments of L2-2 but did not received comments of L2-3,
we found two main factors resulting in this phenomenon.
One factor is contributors' unfamiliarity with
the programming language and the APIs (\eg Example 1 and Example 2 in Table~\ref{Example:ed}).
and the other one is the lack of development experience for some contributors (\eg Example 3 and Example 4 in Table~\ref{Example:ed}).
These factors tend to produce runnable but ``low-quality'' codes
which are of less elegance or
even break the compatibility and introduce potential risks.
This reality justifies the necessary of manual code review
even although an increasing number of automatic tools are coming into being.
\subsubsection{Convention are sometimes overlooked}
\begin{figure*}[htbp]
\centering
\includegraphics[width=\textwidth]{resources/Fig-7.pdf}
\caption{The cumulative distribution of pull-requests.
(a) Rails (b) Elasticsearch (c) Angular.js}
\label{fig:prd}
\end{figure*}
Also most projects set specific development conventions
which do not require too much effort to follow,
they are, somehow, overlooked by contributors which can be seen from the following examples.
\textbf{Example\_1:}
``\textit{
Thanks! Could you please add `[ci skip]' to your commit message
to avoid Travis to trigger a build ?
}''
\textbf{Example\_2:}
``\textit{
@xxx PR looks good to me.
Can you squash the two commits into one.
Fix and test should go there in a commit.
Also please add a change log. Thanks.
}''
To better understand this problem,
we did a statistical analysis to explore whether this kind of comments (L2-9)
distributed differently on the roles of contributors
(core members or external contributors).
We found that pull-requests submitted by external contributors
receive more comments of L2-9 than those submitted by core members,
which accounts for 83.9\%, 60.1\%, and 80.7\% of the total number in
Rails, Elasticsearch, and Angular.js respectively.
Furthermore, we explored how an external developer's contribution experience
affected the possibility of his pull-requests getting comments of L2-9.
Figure~\ref{fig:prd} shows how many percentage of the pull-requests
that receive comments of L2-9
are created and accumulated during developers' contribution history.
It is clear that a considerable proportion of such pull-requests
(68.4\% in Rails, 75.1\% in Elasticsearch, and 91.9\% in Angular.js)
have been generated in the first five submissions.
The above figures illustrate that
external contributors are more likely to submit pull-reqeusts
that break project conventions in their early contributions.
Therefore, it is necessary to make external contributors
gain a well understanding of project conventions in the very early stage.
In fact, most project management teams have provided contributors
with specific guidelines\footnote{
\eg http://edgeguides.rubyonrails.org/contributing\_to\_ruby\_on\_rails.html
}.
It seems, however, that not everyone is willing to
go through the tedious specification before contribution.
This may inspire GitHub to improve its collaborative mechanism
and offer more efficient development tools.
For example, it is better to briefly display
a clear list of development conventions
which are edited by the management team of a project
before a developer create a pull-request to this project.
Another alternative solution for GitHub is to provide automatic reviewing tools
that can be configured with predefined convention rules
and triggered as a new pull-request is submitted.
\noindent
\textbf{In Summary.}
Most comments are discussing about code correcting and social interactions.
External contributors are more likely to
break project conventions in their early contributions,
and their pull-requests might contain potential issues,
even though they have passed the tests.

View File

@ -0,0 +1,56 @@
\section{THREATS TO VALIDITY}
\label{sec:Threats}
In this section, we discuss threats to construct validity, internal validity
and external validity which may affect the results of our study.
\noindent\textbf{Construct validity:}
The first threat involves the taxonomy definition
since some of the categories could be overlapping or missing.
To alleviate this problem, we, together with other experienced developers,
randomly selected review comments and
classified them according to our taxonomy definition.
The verification result showed that our taxonomy is complete
and did not miss any significant categories.
Secondly, manually labeling thousands of reviews comments is a
repetitive, time-consuming and boring task.
To get reliable labeled set, the first author of the paper was assigned
to do this job and we tried our best to provide him a pleasant work environment.
Moreover, if it has been a long time since last round of labeling work,
the first author would revisit the taxonomy and go through some labeled comments
to ensure his accurate and consistent understanding of his job
before the next round of labeling work proceeds.
In addition, it is possible that incorrect classification of \TSHC may affect our findings
even our model achieved high precision of 76\%-83\%.
However, our preliminary quantitative study is mainly
based on the comments distribution on each category,
and wrong classification is not likely to alter the distribution significantly,
therefore our findings are not easily affected by some extent of incorrect classification.
\noindent\textbf{Internal validity:}
There are many machine learning methods
that can be used to solve the classification problem
and the choice of ML algorithm has a direct impact on the classification performance.
We compared several ML algorithms including linear regression classifier,
adaBoost classifier, random forest classifier, and SVM and
found that SVM performed better than others on classifying review comments.
Furthermore, we also did some necessary parameter optimization.
\noindent\textbf{External validity:}
Our findings are based on the dataset of three open source projects hosted on GitHub.
To increase the generalizability of our research,
we have selected projects with different programming language,
and different application areas.
Nevertheless our dataset is a small sample compared to
the total number of projects in GitHub.
Hence, it is not very sure whether the results can be generalized to
all the other projects hosted on GitHub or
those which are hosted on other platforms.

View File

@ -0,0 +1,40 @@
\section{CONCLUSION\&FUTURE WORK}
\label{sec:Concl}
Code review is one of the most significant stages in pull-based development.
It ensures that only high-quality pull-requests are accepted,
based on the in-depth discussions among reviewers.
To comprehensively understand the review topics in pull-based model,
we first conducted a qualitative study on
three popular open-source software projects hosted on GitHub
and constructed a fine-grained two-level taxonomy covering
4 Level-1 categories (Code Correctness, PR Decision-making, Project Management , and Social Interaction)
and 11 Level-2 sub-categories (\eg defect detecting, reviewer assigning, contribution encouraging).
Second, we did preliminary quantitative analysis on a large set of review comments
that are labeled by a two-stage hybrid classification algorithm
which is able to automatically classify review comments by combining rule-based and machine-learning techniques.
Through the quantitative study, we explored the typical review patterns.
We found that the three projects represent similar comments distribution on each category.
Pull-requests submitted by inexperienced contributors
tend to contain potential issues even though they have passed the tests.
Furthermore, external contributors are more likely to break project conventions
in their early contributions.
%further work
Nevertheless, \TSHC performs poorly on a few Level-2 subcategories.
More work could be done in the future to improve it.
We plan to address the shortcomings of our approach
by extending the manually labeled dataset
and introducing the sentiment analysis.
Moreover, we will try to dig more valuable information
(\eg comment co-occurrence, emotion shift,\etc)
from the experiment result in the paper
and assist core members
to better organize the code review process,
such as improving reviewer recommendation,
contributor assessment,
and pull-request prioritization.
% \section*{Acknowledgment}
% The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61303064, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
% The authors would like to thank... more thanks here

View File

@ -0,0 +1,462 @@
@article{Yu:2015,
title={Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment?},
author={Yu, Yue and Wang, Huaimin and Yin, Gang and Wang, Tao},
journal={Information and Software Technology},
volume={74},
pages={204--218},
year={2016},
publisher={Elsevier}
}
@inproceedings{yu2014reviewer,
title={Reviewer recommender of pull-requests in GitHub},
author={Yu, Yue and Wang, Huaimin and Yin, Gang and Ling, Charles X},
booktitle={2014 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
pages={609--612},
year={2014},
organization={IEEE}
}
@inproceedings{yu2015wait,
title={Wait For It: Determinants of Pull Request Evaluation Latency on GitHub},
author={Yu, Yue and Wang, Huaimin and Filkov, Vladimir and Devanbu, Premkumar and Vasilescu, Bogdan},
booktitle={MSR},
year={2015}
}
@inproceedings{Barr:2012,
title={Cohesive and isolated development with branches},
author={Barr, Earl T and Bird, Christian and Rigby, Peter C and Hindle, Abram and German, Daniel M and Devanbu, Premkumar},
booktitle={International Conference on Fundamental Approaches To Software Engineering},
pages={316-331},
year={2012},
}
@inproceedings{Gousios:2014,
title={An exploratory study of the pull-based software development model},
author={Gousios, Georgios and Pinzger, Martin and Deursen, Arie Van},
booktitle={International Conference Software Engineering},
pages={345-355},
year={2014},
}
@inproceedings{Vasilescu:B,
title={Quality and productivity outcomes relating to continuous integration in GitHub},
author={Vasilescu, Bogdan and Yu, Yue and Wang, Huaimin and Devanbu, Premkumar and Filkov, Vladimir},
booktitle={Joint Meeting},
pages={805-816},
year={2015},
}
@inproceedings{Tsay:2014a,
title={Let's talk about it: evaluating contributions through discussion in GitHub},
author={Tsay, Jason and Dabbish, Laura and Herbsleb, James},
booktitle={The ACM Sigsoft International Symposium},
pages={144-154},
year={2014},
}
@inproceedings{Gousios:2014b,
title={Work practices and challenges in pull-based development: the integrator's perspective},
author={Gousios, Georgios and Zaidman, Andy and Storey, Margaret-Anne and Van Deursen, Arie},
booktitle={Proceedings of the 37th International Conference on Software Engineering},
pages={358--368},
year={2015},
organization={IEEE}
}
@inproceedings{gousios2016work,
title={Work practices and challenges in pull-based development: The contributor's perspective},
author={Gousios, Georgios and Storey, Margaret-Anne and Bacchelli, Alberto},
booktitle={Proceedings of the 38th International Conference on Software Engineering},
pages={285--296},
year={2016},
organization={ACM}
}
@inproceedings{Marlow:2013,
title={Impression formation in online peer production: activity traces and personal profiles in github},
author={Marlow, Jennifer and Dabbish, Laura and Herbsleb, Jim},
booktitle={Proceedings of the 2013 conference on Computer supported cooperative work},
pages={117--128},
year={2013},
organization={ACM}
}
@inproceedings{Veen:2015,
title={Automatically prioritizing pull requests},
author={Van Der Veen, Erik and Gousios, Georgios and Zaidman, Andy},
booktitle={Proceedings of the 12th Working Conference on Mining Software Repositories},
pages={357--361},
year={2015},
organization={IEEE Press}
}
@inproceedings{Thongtanunam:2015,
title={Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review},
author={Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit and Kula, Raula Gaikovina and Yoshida, Norihiro and Iida, Hajimu and Matsumoto, Kenichi},
booktitle={IEEE International Conference on Software Analysis, Evolution, and Reengineering},
pages={141-150},
year={2015},
}
@inproceedings{t2015cq,
title={Investigating code review practices in defective files: an empirical study of the Qt system},
author={Thongtanunam, Patanamon and McIntosh, Shane and Hassan, Ahmed E and Iida, Hajimu},
booktitle={Proceedings of the 12th Working Conference on Mining Software Repositories},
pages={168--179},
year={2015},
organization={IEEE Press}
}
@inproceedings{rahman2016correct,
title={CoRReCT: code reviewer recommendation in GitHub based on cross-project and technology experience},
author={Rahman, Mohammad Masudur and Roy, Chanchal K and Collins, Jason A},
booktitle={Proceedings of the 38th International Conference on Software Engineering Companion},
pages={222--231},
year={2016},
organization={ACM}
}
@article{jiang:2015,
title={CoreDevRec: Automatic Core Member Recommendation for Contribution Evaluation},
author={Jiang, Jing and He, Jia Huan and Chen, Xue Yuan},
journal={Journal of Computer Science and Technology},
volume={30},
number={5},
pages={998-1016},
year={2015},
}
@inproceedings{Tsay:2014b,
title={Influence of social and technical factors for evaluating contribution in GitHub},
author={Tsay, Jason and Dabbish, Laura and Herbsleb, James},
booktitle={ICSE},
pages={356-366},
year={2014},
}
@inproceedings{Bacchelli:2013,
title={Expectations, outcomes, and challenges of modern code review},
author={Bacchelli, A and Bird, C},
booktitle={International Conference on Software Engineering},
pages={712-721},
year={2013},
}
@article{Shah:2006,
title={Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development.},
author={Shah, Sonali K.},
journal={Management Science},
volume={52},
number={7},
pages={1000-1014},
year={2006},
}
@book{Porter:1997,
title={An algorithm for suffix stripping},
author={Porter, M. F},
publisher={Morgan Kaufmann Publishers Inc.},
year={1997},
}
@inproceedings{Yu:2014,
title={Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration},
author={Yu, Yue and Wang, Huaimin and Yin, Gang and Ling, Charles X},
booktitle={Asia-Pacific Software Engineering Conference},
pages={335-342},
year={2014},
}
@article{Thongtanunam:2016,
title={Review participation in modern code review},
author={Thongtanunam, Patanamon and Mcintosh, Shane and Hassan, Ahmed E. and Iida, Hajimu},
journal={Empirical Software Engineering},
pages={1-50},
year={2016},
}
@inproceedings{Beller:2014,
title={Modern code reviews in open-source projects: which problems do they fix?},
author={Beller, Moritz and Bacchelli, Alberto and Zaidman, Andy and Juergens, Elmar},
booktitle={Working Conference on Mining Software Repositories},
pages={202-211},
year={2014},
}
@article{Kollanus:2009,
title={Survey of Software Inspection Research},
author={Kollanus, Sami and Koskinen, Jussi},
journal={Open Software Engineering Journal},
volume={3},
number={1},
year={2009},
}
@inproceedings{Baysal:2013,
title={The influence of non-technical factors on code review},
author={Baysal, O. and Kononenko, O. and Holmes, R. and Godfrey, M. W.},
booktitle={Reverse Engineering},
pages={122-131},
year={2013},
}
@article{Baysal:2015,
title={Investigating technical and non-technical factors influencing modern code review},
author={Baysal, Olga and Kononenko, Oleksii and Holmes, Reid},
journal={Empirical Software Engineering},
volume={21},
number={3},
pages={1-28},
year={2016},
}
%Mcintosh2016An
@article{Mcintosh:***,
title={An empirical study of the impact of modern code review practices on software quality},
author={Mcintosh, Shane and Kamei, Yasutaka and Adams, Bram},
journal={Empirical Software Engineering},
volume={21},
number={5},
pages={1-44},
year={2016},
}
@inproceedings{Mcintosh:2014,
title={The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects},
author={Mcintosh, Shane and Kamei, Yasutaka and Adams, Bram and Hassan, Ahmed E},
booktitle={Working Conference on Mining Software Repositories},
pages={192-201},
year={2014},
}
%%%%和Fagan:1999 一样
@incollection{Fagan:1976,
title={Design and code inspections to reduce errors in program development},
author={Fagan, Michael E},
booktitle={Pioneers and Their Contributions to Software Engineering},
pages={301--334},
year={2001},
publisher={Springer}
}
@book{Frank:1984,
title={Software inspections and the industrial production of software},
author={Ackerman, A. Frank and Fowler, Priscilla J and Ebenau, Robert G},
publisher={Elsevier North-Holland, Inc.},
pages={13-40},
year={1984},
}
@book{Baeza:2004,
title={Modern Information Retrieval},
author={Baeza-Yates, Ricardo A and Ribeiro-Neto, Berthier},
publisher={China Machine Press},
pages={2628},
year={2004},
}
@article{Ackerman:1989,
title={Software inspections: an effective verification process},
author={Ackerman, A. F and Buchwald, L. S and Lewski, F. H},
journal={IEEE Software},
volume={6},
number={3},
pages={31-36},
year={1989},
}
@article{Russell:1991,
title={Experience With Inspection in Ultralarge-Scale Development},
author={Russell, Glen W},
journal={IEEE Software},
volume={8},
number={1},
pages={25-31},
year={1991},
}
@article{Aurum:2002,
title={State-of-the-art: software inspections after 25 years},
author={Aurum, Aybuke and Petersson, Hakan and Wohlin, Claes},
journal={Sofware Testing Verification \& Reliability},
volume={12},
number={3},
pages={133-154},
year={2002},
}
%Rigby2012Contemporary
@article{Rigby:**,
title={Contemporary Peer Review in Action: Lessons from Open Source Development},
author={Rigby, P. C and Cleary, B and Painchaud, F and Storey, M},
journal={IEEE Software},
volume={29},
number={6},
pages={56-61},
year={2012},
}
@article{Rigby:2006,
title={A preliminary examination of code review processes in open source projects},
author={Rigby, Peter C. and German, Daniel M.},
journal={University of Victoria},
}
@inproceedings{Rigby:2008,
title={Open source software peer review practices: a case study of the apache server},
author={Rigby, Peter C and German, Daniel M and Storey, Margaret Anne},
booktitle={International Conference on Software Engineering},
pages={541-550},
year={2008},
}
@inproceedings{Rigby:2011,
title={Understanding broadcast based peer review on open source software projects},
author={Rigby, Peter C. and Storey, Margaret Anne},
booktitle={International Conference on Software Engineering},
pages={541-550},
year={2011},
}
@inproceedings{Riby:2013,
title={Convergent contemporary software peer review practices},
author={Rigby, Peter C and Bird, Christian},
booktitle={Joint Meeting on Foundations of Software Engineering},
pages={202-212},
year={2013},
}
@book{Rigby:2014,
title={A Mixed Methods Approach to Mining Code Review Data: Examples and a study of multi-commit reviews and pull requests},
author={Rigby, Peter C and Bacchelli, Alberto and Gousios, Georgios and Mukadam, Murtuza},
pages={231-255},
year={2014},
}
@inproceedings{Gerrit,
title={Gerrit. https://code.google.com/p/gerrit/. \hl{Accessed 2014/01/20}}
}
@inproceedings{GitHub,
title={GitHub. https://github.com/. Accessed \hl{2014/01/14}}
}
@inproceedings{Trustie,
title={Trustie. https://www.trustie.net/ \hl{Accessed 2014/01/14}}
}
@inproceedings{Baum:2016,
title={Factors influencing code review processes in industry},
author={Baum, Tobias and Liskin, Olga and Niklas, Kai and Schneider, Kurt},
booktitle={ACM Sigsoft International Symposium},
pages={85-96},
year={2016},
}
@article{Ciolkowski:2003,
title={Software Reviews: The State of the Practice},
author={Ciolkowski, Marcus and Laitenberger, Oliver and Biffl, Stefan},
journal={Software IEEE},
volume={20},
number={20},
pages={46-51},
year={2003},
}
@article{Yu:2016,
title={Determinants of pull-based development in the context of continuous integration},
author={Yu, Yue and Yin, Gang and Wang, Tao and Yang, Cheng and Wang, Huaimin},
journal={Science China Information Sciences},
volume={59},
number={8},
pages={1-14},
year={2016},
}
@inproceedings{Lima,
title={Developers assignment for analyzing pull requests},
author={De, Lima J and Nior, Manoel Limeira and Soares, Daric and Moreira, Lio and Plastino, Alexandre and Murta, Leonardo},
booktitle={The ACM Symposium},
pages={1567-1572},
year={2015},
}
@inproceedings{Antoniol:2008,
title={Is it a bug or an enhancement?: a text-based approach to classify change requests},
author={Antoniol, Giuliano and Ayari, Kamel and Penta, Massimiliano Di and Khomh, Foutse and Neuc, Yann Ga},
booktitle={Conference of the Centre for Advanced Studies on Collaborative Research, October 27-30, 2008, Richmond Hill, Ontario, Canada},
pages={23},
year={2008},
}
@article{Zhou:2014,
title={Combining text mining and data mining for bug report classification},
author={Zhou, Yu and Tong, Yanxiang and Gu, Ruihang and Gall, Harald},
journal={Journal of Software: Evolution and Process},
year={2016},
publisher={Wiley Online Library}
}
@inproceedings{Pingclasai:2013,
title={Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling},
author={Pingclasai, N. and Hata, H. and Matsumoto, K. I.},
booktitle={Asia-Pacific Software Engineering Conference},
pages={13 - 18},
year={2013},
}
@book{Herzig:2013,
title={It's not a Bug, it's a Feature: How Misclassification Impacts Bug Prediction},
author={Herzig, Kim and Just, Sascha and Zeller, Andreas},
pages={392-401},
year={2013},
}
@inproceedings{Ciurumelea:2017,
title={Analyzing Reviews and Code of Mobile Apps for Better Release Planning},
author={Ciurumelea, Adelina and Schaufelbuhl, Andreas and Panichella, Sebastiano and Gall, Harald},
booktitle={IEEE International Conference on Software Engineering},
year={2016},
}
@inproceedings{Yu:2015b,
title={Quality and productivity outcomes relating to continuous integration in GitHub},
author={Vasilescu, Bogdan and Yu, Yue and Wang, Huaimin and Devanbu, Premkumar and Filkov, Vladimir},
booktitle={Joint Meeting},
pages={805-816},
year={2015},
}
@article{zhang2017social,
title={Social media in GitHub, the role of@-mention in assisting software development},
author={Zhang, Yang and Wang, Huaimin and Yin, Gang and Wang, Tao and Yu, Yue},
journal={Information Sciences},
volume={60},
number={032102},
pages={032102:1-032102:18},
year={2017}
}
@inproceedings{Storey2014The,
title={The (R) Evolution of social media in software engineering},
author={Storey, Margaret Anne and Singer, Leif and Cleary, Brendan and Filho, Fernando Figueira and Zagalsky, Alexey},
booktitle={on Future of Software Engineering},
pages={100-116},
year={2014},
}
@inproceedings{Zhu2016Effectiveness,
title={Effectiveness of code contribution: from patch-based to pull-request-based tools},
author={Zhu, Jiaxin and Zhou, Minghui and Mockus, Audris},
booktitle={ACM Sigsoft International Symposium on Foundations of Software Engineering},
pages={871-882},
year={2016},
}

View File

@ -0,0 +1,141 @@
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{Jcst}
[2009/10/08 for Journal of Computer Science & Technology]
%\DeclareOption*{\PassOptionsToClass{\CurrentOption}{article}}
\RequirePackage{multicol}
\RequirePackage{graphicx}
\RequirePackage{CJK}
\RequirePackage{float}
% Force 'figure' and 'table' to use [H] placement
\let\figure@bak\figure
\renewcommand\figure[1][]{\figure@bak[H]}
\let\table@bak\table
\renewcommand\table[1][]{\table@bak[H]}
\let\balance\relax
\long\def\singlecolumn#1{\end{multicols}#1\begin{multicols}{2}}
%---------------------
\setlength{\headsep}{5truemm}
\setlength{\headheight}{4truemm}
\setlength{\textheight}{231truemm}
\setlength{\textwidth}{175truemm}
\setlength{\topmargin}{0pt}
\setlength{\oddsidemargin}{-0.5cm}
\setlength{\evensidemargin}{-0.5cm}
\setlength{\footskip}{8truemm}
\setlength{\columnsep}{7truemm}
\setlength{\columnseprule}{0pt}
\setlength{\parindent}{2em}
\renewcommand{\baselinestretch}{1.2}
\renewcommand{\arraystretch}{1.5}
\abovedisplayskip=10pt plus 2pt minus 2pt
\belowdisplayskip=10pt plus 2pt minus 2pt
\renewcommand{\footnoterule}{\kern 1mm \hrule width 10cm \kern 2mm}
%\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\renewcommand{\thefootnote}{}
%-------------------- Set Page Head -----------------------
%#1 for first page author's name and title
%#2 for volume
%#3 for issue
%#4 for end page
%#5 for month
%#6 for year
%#7 for right page author's name and title
\newcommand{\setpageinformation}[7]{%
\thispagestyle{empty}
\pagestyle{myheadings}\markboth
{\hss{\small\sl J. Comput. Sci. \& Technol., #5. #6, #2, #3}}
{{\small\sl {#7}}\rm\hss}
\vspace*{-13truemm}
{\par\noindent\parbox[t]{17.5cm}{\small {#1}.
\rm JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
#2 #3: \thepage--{#4} \ #5 #6}}
\def\@evenfoot{}
\def\@oddfoot{}
}%
%--------------- Title, keywords, Section, and key words etc., ------------
\renewcommand{\title}[1]{\par\noindent\medskip\unskip\\\Large\textbf{#1}\par\medskip}%
\newcommand{\keywords}[1]{\small\par\noindent\textbf{Keywords}\quad #1\par\bigskip}
\newcommand\email[1]{\par\medskip\noindent\normalsize\textrm{E-mail: ~ #1}}
\newcommand\received[1]{\par\medskip\noindent\normalsize\textrm{Received #1}\par\medskip}
\newcommand\revised[1]{\par\medskip\noindent\normalsize\textrm{Revised #1}\par\medskip}
\renewcommand\author[1]{\par\medskip\noindent\normalsize\unskip\\\textrm{#1}}
\newcommand\address[1]{\par\medskip\noindent\small\unskip\\\textit{#1}}
\renewenvironment{abstract}{\par\medskip\small
\noindent{\bfseries \abstractname\quad}}{\par\medskip}
%----------------------------------------------------------------------
\renewcommand\section{\@startsection{section}{1}{\z@}%
{3.5ex \@plus 1ex \@minus .2ex}%
{2.3ex \@plus.2ex}%
{\normalfont\normalsize\bfseries}}
\renewcommand\subsection{\@startsection{subsection}{2}{\z@}%
{3.25ex\@plus 1ex \@minus .2ex}%
{1.5ex \@plus .2ex}%
{\normalfont\normalsize\bfseries}}
\renewcommand\subsubsection{\@startsection{subsubsection}{2}{\z@}%
{3.25ex\@plus 1ex \@minus .2ex}%
{1.5ex \@plus .2ex}%
{\normalfont\normalsize\itshape}}
%\def\@seccntformat#1{\csname the#1\endcsname. }
%%
% Insert \small and remove colon after table/figure number
\long\def\@makecaption#1#2{%
\vskip\abovecaptionskip
\small%
\sbox\@tempboxa{#1 #2}%
\ifdim \wd\@tempboxa >\hsize
#1. #2\par
\else
\global \@minipagefalse
\hb@xt@\hsize{\hfil\box\@tempboxa\hfil}%
\fi
\vskip\belowcaptionskip}
%----------------------------------------------
\renewcommand\theequation{\arabic{equation}}
\renewcommand\thetable{\arabic{table}}
\renewcommand\thefigure{\arabic{figure}}
\renewcommand\refname{References}
\renewcommand\tablename{\small \textbf{Table}}
\renewcommand\figurename{\small \rm Fig.}
%\AtBeginDocument{\label{firstpage}}
%\AtEndDocument{\label{lastpage}}
\renewcommand \thefigure {\@arabic\c@figure}
\long\def\@makecaption#1#2{%
\vskip\abovecaptionskip
\sbox\@tempboxa{\textbf{#1.} #2}%
\ifdim \wd\@tempboxa >\hsize
#1. #2\par
\else
\global \@minipagefalse
\hb@xt@\hsize{\hfil\box\@tempboxa\hfil}%
\fi
\vskip\belowcaptionskip}
%---------------- Theorem, Lemma, etc., and proof ----------
\def\@begintheorem#1#2{\par{\bf #1~#2.~}\it\ignorespaces}
\def\@opargbegintheorem#1#2#3{\par{\bf #1~#2~(#3).~}\it\ignorespaces}
\newenvironment{proof}{\par\textit{Proof.~}\ignorespaces}
{\hfill~\fbox{\vbox to0.5mm{\vss\hbox to0.5mm{\hss}\vss}}\par}
\newtheorem{theorem}{\indent Theorem}
\newtheorem{lemma}{\indent Lemma}
\newtheorem{remark}{\indent Remark}
\newtheorem{definition}{\indent Definition}
%-----------------------------------------------
\newcommand{\biography}[1]{\par\noindent
\vbox to 36truemm{\vss
\hbox to 32truemm{\hss\includegraphics[height=32truemm,width=24truemm]
{#1}\hss}}\par\vspace*{-34truemm}\par
\hangindent=32truemm\hangafter=-10\noindent\indent}
%-----------------------------------------------
\endinput
%%
%% End of file `jcst.sty'.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,111 @@
\documentclass[12pt,twoside]{article}
\usepackage{Jcst}
%%% from old %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{framed}
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
\usepackage{graphicx}
\usepackage{subfig}
\usepackage{setspace}
\usepackage{color,soul}
\usepackage{colortbl}
% \usepackage{enumitem}
\usepackage{enumerate}
\usepackage{threeparttable}
\usepackage{multirow,booktabs}
% \usepackage{xcolor}
\usepackage[table,xcdraw]{xcolor}
\usepackage{url}
\usepackage{array}
\usepackage{booktabs}
% \usepackage{subfigure}
\usepackage{caption}
\usepackage{dcolumn}
\usepackage{xspace}
\usepackage{balance}
\usepackage{bm}
\usepackage{cite}
\usepackage{amsmath}
\newcommand{\ie}{{\emph{i.e.}},\xspace}
\newcommand{\viz}{{\emph{viz.}},\xspace}
\newcommand{\eg}{{\emph{e.g.}},\xspace}
\newcommand{\etc}{\emph{etc}}
\newcommand{\etal}{{\emph{et al.}}}
\newcommand{\GH}{{\sc GitHub}\xspace}
\newcommand{\BB}{{\sc BitBucket}\xspace}
\newcommand{\TSHC}{TSHC\xspace}
\makeatletter
\g@addto@macro{\UrlBreaks}{\UrlOrds}
\makeatother
\makeatletter
\def\url@leostyle{%
\@ifundefined{selectfont}{\def\UrlFont{\same}}{\def\UrlFont{\scriptsize\bf\ttfamily}}}
\makeatother
\urlstyle{leo}
%%% from old %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\setcounter{page}{1}
\setpageinformation
%{Head of the first page} {Running head of odd pages}
{Li ZX, Yu Y, Yin G \textit{et al.} Analyzing Code Reviews in Pull-based Model}
{}{}{}{Mon.}{Year}
{Zhi-Xing Li {\it et al}.:Analyzing Code Reviews in Pull-based Model}
\begin{CJK}{GBK}{song}
\title{What are they talking about?\\
Analyzing Code Reviews in Pull-based Development Model}
\footnotetext{Special Section on Software Systems}
\footnotetext{This work was supported by the National Natural Science Foundation of China
under Grant Nos. 61432020, 61303064, 61472430 and 61502512
and National Grand R\&D Plan under Grant No. 2016YFB1000805.}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\author{Zhi-Xing Li,
Yue Yu~\footnote{Corresponding Author}, {\it Member, CCF, ACM},
Gang Yin, {\it Member, CCF, ACM},
Tao Wang, {\it Member, CCF, ACM},
and Huai-Min Wang, {\it Member, CCF, ACM}}
\address{College of Computer, National University of Defense Technology, Changsha 410073, China
}
\email{\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn}
\received{April 21, 2017}
\revised{June 30, 2017}%Revised date
\input{0_abstract}
\keywords{Pull-request; code review; review comments}
\begin{multicols}{2}
\normalsize
\renewcommand{\thefootnote}{\arabic{footnote}}
\setcounter{footnote}{0}
\input{1_introduction}
\input{2_background}
\input{3_approach}
\input{4_result}
\input{6_threats}
\input{8_conlusion}
\bibliographystyle{jcst}
\bibliography{9_ref}
\input{10_authors}
\end{multicols}
\end{CJK}
\end{document}

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.6 KiB

View File

@ -0,0 +1 @@
Analyzing Code Reviews in Pull-based Model

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

View File

@ -0,0 +1,2 @@
Zhixing Li is a Master student in the College of Computer at National University of Defense Technology.
His work interests include open source software engineering, data mining, and knowledge discovering in open source software.

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@ -0,0 +1,6 @@
Yue Yu is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph D degree in Computer Science from National University of Defense Technology (NUDT) in 2016.
He has visited UC Davis supported by CSC scholarship.
His research findings have been published on MSR, FSE, IST, ICSME APSEC and SEKE.
His current research interests include software engineering, spanning from mining software repositories
and analyzing social coding networks.

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@ -0,0 +1,5 @@
Gang Yin is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph D degree in Computer Science from National University of Defense Technology (NUDT) in 2006.
He has worked in several grand research projects including National 973, 863 projects.
He has published more than 60 research papers in international conferences and journals.
His current research interests include distributed computing, information security, software engineering, and machine learning.

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

View File

@ -0,0 +1,3 @@
Tao Wang is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph D degree in Computer Science from National University of Defense Technology (NUDT) in 2015.
His work interests include open source software engineering, machinelearning, datamining, and knowledge discovering in open source software.

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.7 KiB

View File

@ -0,0 +1 @@
Huaimin Wang received his Ph D in Computer Science from National University of Defense Technology (NUDT) in 1992. He is now a professor and chief engineer in department of educational affairs, NUDT. He has been awarded the “Chang Jiang Scholars Program” professor and the Distinct Young Scholar, etc. He has published more than 100 research papers in peer-reviewed international conferences and journals. His current research interests include middleware, software agent, and trustworthy computing.https://tex.stackexchange.com/questions/196036/odd-graphics-error-division-by-0

Binary file not shown.

Binary file not shown.