first commit
This commit is contained in:
commit
865d721085
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
|
@ -0,0 +1,623 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
\begin{IEEEkeywords}
|
||||
Pull-request; code review; comment classification;
|
||||
\end{IEEEkeywords}
|
||||
|
||||
|
||||
\IEEEpeerreviewmaketitle
|
||||
|
||||
|
||||
|
||||
\section{Introduction}
|
||||
The pull-based development model is becoming increasingly popular in distributed collaboration for open source project development \cite{Yu:2015,Barr:2012,Gousios:2014}. Outside contributors are free to clone (or fork) any public repository and modify the cloned project locally without asking for the access to the central repository. When they are ready to merge the modification into the main branch, they can request the core team to pull their changes by submitting a pull request \cite{Gousios:2014,Vasilescu:B}. Integrators (members of the core team) will evaluate the changes and decide whether to accept or reject it. Currently, code hosting sites such as Github (github.com) and Bitbucket (bitbucket.com) have supported pull-based model and many open source projects hosted on these sites have adopted this development model \cite{Gousios:2014}. Especially, on Github, the work environment is transparent and social media features are integrated in pull-request mechanism \cite{Yu:2015,Tsay:2014a}, so users can watch the development activities of projects of interest and comment on other’s contributions.
|
||||
|
||||
When reviewing a pull-request, integrators try to understand the changes and evaluate the contribution quality \cite{Tsay:2014a,Gousios:2014b,Marlow:2013}. When they doubt about the appropriateness of the problem the pull-request tends to resolve, they usually ask the contributors to prove the value of this contribution by showing some use case or just refuse to accept it. While when the resolution has defects or they are not contend with the resolution, integrators request contributors to improve the implementation, sometimes with suggestion, or directly provide more excellent alternatives. After receiving the feedback from integrators, contributors tend to respond positively. Finally, integrators and contributors will reach a consensus after iterative discussions. Occasionally, community audience (not an integrator or pull-request submitter) would express their opinions and influence the decision process through community support or project and company support.
|
||||
|
||||
Various stakeholders involved in the pull-request discussion comment on the contribution from different perspectives. Familiarity of changed files, understanding of the project’s roadmap, social relationship with the contribution and many other factors influence their focus on the discussion. Some of them care about the contribution appropriateness (whether this contribution is necessary), some pay attention to code quality (perfection of functionality or performance) and some inspect project’s norm consistence (like programming convention).
|
||||
Lots of valuable information is contained in the pull-request discussion. Previous studies has studied on
|
||||
|
||||
[****some related work*****]
|
||||
|
||||
%%% 前人工作:discussion length, comment social network, social media,
|
||||
%% 此外还有comment 分类【\cite{Bacchelli2013},他们是对MS里的各个团队进行的一个人工分类调查,开发环境不够透明,也不都是开源项目】
|
||||
%% Assessing MCR Discussion Usefulness using Semantic Similarity 这篇论文讨论的是comment是不是userful
|
||||
|
||||
But the classification support significant work like reviewer recommendation \cite{Yu:2015,Thongtanunam:2015,jiang:2015}, contribution evaluation \cite{Tsay:2014a,Tsay:2014b} and pull-request Prioritization \cite{Veen:2015}.
|
||||
|
||||
%Pull –request
|
||||
%Discussion
|
||||
%Comment
|
||||
%What we have done
|
||||
%Our contribution
|
||||
% 对comment的well understood
|
||||
% 一个分类体系
|
||||
% 一份标注集
|
||||
|
||||
%【[\cite{Georgios2014}
|
||||
%decision making process that integrators go through while evaluating contributions
|
||||
%quality and prioritization. The quality phenomenon manifests itself by the explicit request of integrators that pull requests undergo code %review, their concern for quality at the source code level and the presence of tests. Prioritization is also a concern for integrators as they %typically need to manage large amounts of contribution requests simultaneously]
|
||||
%[\cite{Gousios2014}
|
||||
%In most projects, more than half of the partici- pants are community members. This is not true however for the number of comments; in most %projects the majority of the com- ments come from core team members.
|
||||
%是不是可以说 审阅者 处于主动地位,而贡献者处于被动地位]
|
||||
|
||||
|
||||
%【Based on our findings, we provide recommendations for practitioners and implications for researchers as well as outline future avenues for research】
|
||||
|
||||
The rest of paper is organized like this: Section 2 reviews briefly related work and Section 3 explains related concepts. Section 4 describes in detail our method through a prototype design. Section 5 presents our empirical experiment and evaluation on our method. Section 6 explains some threats to validity of our method. Finally section 7 conclude this paper with future work.
|
||||
|
||||
\section{Discussion on Pull-request in GitHub}
|
||||
|
||||
\subsection{pull-request in Github}
|
||||
In GitHub, a growing number of developers contribute to open source using the pull-request mechanism\cite{Gousios:2014,Yu:2014}.
|
||||
As illustrated by Figure \ref{fig:github_wf}, typical contribution process [6] in GitHub involves following steps.
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Fork:} First of all, a contributor could find an interesting project by following some well-known developers and watching their projects. Before contributing, he has to fork the original project.
|
||||
|
||||
\item \emph{Edit:} After forking, the contributor can edit locally without disturbing the \hl{main stream branch}. He is free to do whatever he wants to implement a new feature or fixe some bugs based on his cloned repository.
|
||||
|
||||
\item \emph{Request:} When his work is finished, the contributor submits the patches from the forked repository to its source by a pull-request. Except \hl{commits}, the submmiter needs to provide a title and descrition to explain the objective of his pull-request, which is shown in Figure \ref{fig:pr}.
|
||||
|
||||
\item \emph{Review:} Then, all developers in the community have the chance to review that pull-request in the issue tracker. They can freely discuss whether the project needs that feature, whether the code style meets the standard or how to further improve the code quality. At the meatime, considering reviewers' suggestions, the contributor would update his pull-request by attaching new commits, and then reviewers discuss that pull-request again.
|
||||
|
||||
\item \emph{Test:} To ensure the pull-request doesn't break current runnable state, some reviews plays a role of testers. They can check the submitted changes by manully running the patches locally or through a automated way with the help of continuous integration (CI) services.
|
||||
|
||||
\item \emph{Merge: } Finally, a responsible manager of the core team takes all the opinions of reviewers into consideration, and then merges or rejects that pull-request.
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
\includegraphics[width=8cm]{github_wf.png}
|
||||
\caption{Pull-request workflow on GitHub}
|
||||
\label{fig:github_wf}
|
||||
\end{figure}
|
||||
|
||||
\begin{figure}[ht]
|
||||
%https://github.com/rails/rails/pull/27561
|
||||
\centering
|
||||
\includegraphics[width=8cm]{pr.png}
|
||||
\caption{Example pull-request on GitHub}
|
||||
\label{fig:pr}
|
||||
\end{figure}
|
||||
|
||||
|
||||
% FERRTM
|
||||
%为啥有comment
|
||||
|
||||
|
||||
\subsection{Review Comment on pull-request}
|
||||
Code review is an important software engineering practice to assess code changes in open source projects[]. Unlike the formal code inspection advocated by Fagan\cite{Fagan:1999},which has a well-structured process, GitHub provides a modern code review\cite{Bacchelli:2013} that is a more lightweight approach requiring fewer formal steps\cite{Beller:2014,Kollanus:2009}.
|
||||
The way to give voice on a pull-request is to leave a comment and the following are some introductions to review comments in GitHub.
|
||||
|
||||
\begin{figure}[ht]
|
||||
% https://github.com/rails/rails/pull/12150
|
||||
\centering
|
||||
\includegraphics[width=9cm]{comments.png}
|
||||
\caption{Example comments on GitHub}
|
||||
\label{fig:comment}
|
||||
\end{figure}
|
||||
|
||||
\emph{Comment type:}
|
||||
There are two available ways for reviewers to pushlish comments on a pull-request: general comments as a whole about the overall contribution and inline comments for specific lines of code in the changes(see Figure \ref{fig:comment})\cite{Yu:2015,Thongtanunam:2016}.
|
||||
%可以再多加一点
|
||||
|
||||
\emph{Visualization:}
|
||||
All the updates corresponding to a pull-request will be displayed in a visual representation. For general comments, they just appears in the \hl{main page} of pull request interfaces, but for inline comments, they default to be folded and can be shown after the toggle button is clicked. All comments and other abehaviors are aggregated as a \hl{timline} ordered primarily by time of creation.
|
||||
|
||||
% comment 参与者
|
||||
\emph{Participant:}
|
||||
Any user in GitHub can be a reviewer and have a chance to particpate the discussion of any pull-rquest[]. Usually, core team members comment to ask the submmiter to bring out the pull-request more clearly or improve the solution, pull-request creators comment to explain their work or reply others's suggestion, and other commity audience comment to express their demand and influence the dicision process[].
|
||||
Study\cite{Gousios:2014} also reveals that, in most projects, more than half of the participants are pull-request creators while the majority of the comments come from core teammembers.
|
||||
|
||||
|
||||
\hl{ref more in} \cite{Tsay:2014a}
|
||||
|
||||
|
||||
\section{Classification of Comment}
|
||||
We first construct a fine grained multi-level taxonomy for comments on pull-request reviews and then classify manually a set of *** comments based on which we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning techniques.
|
||||
|
||||
\subsection{Dataset}
|
||||
We conduct experiments on 3 open source projects hosted in GitHub: Rails, Elasticsearch and Angular.js. This ensures a variety of programming language and application areas which is needed to increase the generalizability of our work to some extent. The dataset used in our experiments come from two sources: GHTorrent dump released in **** and our own crawled data from GitHub up to *****.
|
||||
Table \ref{tab:dataset} lists the key statistics of three projects in our dataset:
|
||||
|
||||
\noindent\textbf{Project profile}: We focused on original projects (i.e., not forked from others) written in three of the most popular languages on GitHub: Ruby, Java and JavaScript [https://github.com/blog/2047-language-trends-on-github]. The three projects also cover different application areas: Rails is used to build websites, Elasticsearch acts as a search server and Angular.js is outstanding JS framework for front-end development. They are hosted in GitHub at an early time which tends to reflect that their development teams are more familiar with GitHub so that their activity data is more \hl{valuable}.
|
||||
|
||||
\noindent\textbf{Hotness indicators}: Open source software projects rely on the voluntary efforts of thousands of software developers \cite{Shah:2006}. More contributors means more participation which produces more collaboration data. Before contribution, developers usually ``star'' or fork on an existing repository. Starring is similar to a bookmarking system which informs the user of the latest activity of starred projects. Obviously, the number of star, fork and contributor indicates the hotness of a specific project. And the three projects is selected for their relatively high hotness indicators.
|
||||
|
||||
\noindent\textbf{Application data}: There are two main types of development models in GitHub: \emph{fork and pull model} which is widely used in open source projects and \emph{shared repository model} that is more prevalent with private projects and all of them can \hl{use pull-requests}. All of the pull-requests and comments we collected are from projects that use the former model before \hl{2016-01-01}.
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r l c c c c c c}
|
||||
|
||||
\toprule
|
||||
\textbf{Projects} &\textbf{Language} &\textbf{Hosted\_at} &\textbf{\#Star} &\textbf{\#Fork} &\textbf{\#Contributor} &\textbf{\#Pull-request} &\textbf{\#Comment} \\
|
||||
\midrule
|
||||
\textbf{Rails} & Ruby & 2009-05-20 & 33906 & 13789 & 3194 & 14648 & 75102 \\
|
||||
\textbf{Elasticsearch } & Java & 2010-02-08 & 20008 & 6871 & 753 &6315 & 38930 \\
|
||||
\textbf{Angular.js} & JavaScript & 2010-01-06 & 54231 & 26930 & 1557 & 6376 & 33335 \\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:dataset}
|
||||
\end{table*}
|
||||
|
||||
|
||||
Among the entire set, we manually labeled of * pull-requests and ** comments which was later used to automatically label the left data using our \textbf{3}C algorithm.
|
||||
|
||||
\subsection{Taxonomy Definition}
|
||||
%参考\cite{Georgios2014 和实验经济学的方法
|
||||
Previous work has studied on changes faced by reviewer [ ] and issues introduced by pull-request submitter [], which provide valuable hits for academic researchers and suggestions for industry designers. Inspired by their work, we decide to observe in more detail what reviewers talk about in pull-request reviews rather than just in perspectives of technique and non-technique.
|
||||
|
||||
We conducted a \hl{use case} to determine taxonomy scheme which is developed manually and through an \hl{iterative process} of reading review comments random collected from three projects hosted in GitHub. As Figure *!!!!! shows, this process uses the following steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item \emph{Preparation phase}: The second author first choose three projects ( ) hosted in GitHub \hl{according to project hotness (e.g. \#star, \#fork,\#contributor), and development maturity (e.g. project age, pull-request usage)}. Then, the second author select randomly review comments from the three projects and label each comment with a descriptive message.
|
||||
\item \emph{Execution phase}: Given the previously labeled comments, the first author divide them into different groups based on their descriptive messages. Unlike the aggregation process in card sort \cite{Bacchelli:2013}, we initially default all the comments to be in the same group and iteratively divide the group. Based on a rigorous analysis of the existing literature [] and our own experience with working with and analyzing the pull-based model during the \hl{last 2 years}, we first identified two main groups: technical and non-technical which in turn are divided to more specific groups and finally each comment will be put into the \hl{corresponding group(s)}. After this process, we recognized \hl{more than} ten fine grained categories in our taxonomy scheme.
|
||||
\item \emph{Analysis phase}: On the base of recognition result, the second author abstract taxonomy hierarchies and deduce general categories which forms a multi-level (2 level actually) taxonomy scheme which includes \hl{4 categories} in the first level and \hl{15 subcategories} in the second. With this scheme, the first and second authors together with other \hl{10} participants argument its correctness and convergence and agree on the final decision.
|
||||
\end{enumerate}
|
||||
|
||||
|
||||
Table \ref{tab:taxonomy} is the final taxonomy scheme, for each of these subcategories we include description together with an example comments.
|
||||
|
||||
|
||||
\begin{table*}[ht]
|
||||
\begin{threeparttable}
|
||||
\centering
|
||||
\caption{Taxonomy Scheme}
|
||||
\begin{tabular}{r l p{11cm}}
|
||||
\toprule
|
||||
\textbf{1-level categories} & \textbf{2-level subcategories} & \textbf{Description(Example)} \\
|
||||
\midrule
|
||||
\multirow{12}*{\tabincell{c}{\textbf{Solution correctness}\\(L1-1)}}
|
||||
&\cellcolor{lightgray}Code style &\cellcolor{lightgray}Points out extra blank line, improper intention, inconsistent naming convention, etc. \\
|
||||
&\cellcolor{lightgray}(L21) &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
|
||||
|
||||
&Code correction &Figures out wrong function call and program logic error,etc.\\
|
||||
&(L2-2) &e.g., \emph{``I think this block add some duplication, which is should be supported each time you change `first' method.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Code improvement &\cellcolor{lightgray}Suggestion to enhance functionality, improve performance, etc.\\
|
||||
&\cellcolor{lightgray}(L2-3) &\cellcolor{lightgray}e.g., \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
|
||||
|
||||
&Alternative implementation &Propose a alternative solution better than the current one\\
|
||||
&(L2-4) &none \\
|
||||
|
||||
&\cellcolor{lightgray}Demanding test case &\cellcolor{lightgray}Demand submitter to provide test case for changed codes\\
|
||||
&\cellcolor{lightgray}(L2-5) &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
|
||||
|
||||
&Test feedback &Need to test the change before make a decision or feedback failing test\\
|
||||
&(L2-6) &e.g., \emph{``seems like the tests fail on this one:[http link]''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{PR appropriateness}\\((L1-2))}}
|
||||
&\cellcolor{lightgray}Pull-request acception &\cellcolor{lightgray}Satisfied with the pull-request and agree to merge it\\
|
||||
&\cellcolor{lightgray}(L2-7) &\cellcolor{lightgray}e.g., \emph{``PR looks good to me. Can you squash the two\dots''}\\
|
||||
|
||||
&Pull-request rejection &Reject to merge the pull-request for duplicate proposal, undesired feature, fake bug report, etc.\\
|
||||
&(L2-8) &e.g., \emph{``Closing in favor of \#17233.''}or e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
|
||||
|
||||
&\cellcolor{lightgray}Pull-request haze &\cellcolor{lightgray}Confused with the purpose of the pull-request and demand for use case\\
|
||||
&\cellcolor{lightgray}(L2-9) &\cellcolor{lightgray}e.g., \emph{``Can you provide a use case for this change?''}\\
|
||||
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Project management}\\(L1-3)}}
|
||||
&Project roadmap &States which branch or version this pull-request should be merged into, cares about the status of the pull-request (inactivity, stale), etc.\\
|
||||
&(L2-10) &e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
|
||||
|
||||
&\cellcolor{lightgray}Reviewer assignment &\cellcolor{lightgray}ping other one(s) to review this pull-request\\
|
||||
&\cellcolor{lightgray}(L2-11) &\cellcolor{lightgray}e.g., \emph{``/cc @fxn can you take a look please?''}\\
|
||||
|
||||
&Development norm &Talks about undescriptive description or commit message, needs to squash or rebase the commit contained in the pull-request, etc.\\
|
||||
&(L2-12) &e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
|
||||
%%%%%%
|
||||
\midrule
|
||||
\multirow{6}*{\tabincell{c}{\textbf{Social interaction}\\(L1-4)}}
|
||||
&\cellcolor{lightgray}Polite response &\cellcolor{lightgray}Thanks for what other people do, apologize for their own actions, etc.\\
|
||||
&\cellcolor{lightgray}(L2-13) &\cellcolor{lightgray}e.g.,\emph{``thank you :yellow\_heart:''}\\
|
||||
|
||||
&Positive reaction & Agree with others’ opinion, compliment others’ work, etc.\\
|
||||
&(L2-14) &e.g.,\emph{``:+1: nice one @cristianbica.''}\\
|
||||
|
||||
&\cellcolor{lightgray}Other interaction &\cellcolor{lightgray}Other social interaction\\
|
||||
&\cellcolor{lightgray}(L2-15) &\cellcolor{lightgray}e.g.,\emph{``I can do it for you if you want.''} \\
|
||||
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\begin{tablenotes}
|
||||
\item[1] \emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji on GitHub}
|
||||
\end{tablenotes}
|
||||
\label{tab:taxonomy}
|
||||
\end{threeparttable}
|
||||
\end{table*}
|
||||
|
||||
For the purpose of display brevity, we take as an example the short comments or the key part of long examples. But we would like to point out it is a common phenomenon that a single comment addresses multiple topics. For example, in the following review comments:
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C1: ``Thanks for your contribution! This looks good to me but maybe you could split your patch in two different commits? One for each issue.''}
|
||||
|
||||
\vspace{1em}
|
||||
\emph{C2: ``Looks good to me :+1: /cc @steveklabnik @fxn''}
|
||||
\vspace{1em}
|
||||
|
||||
In the first comment C1, the reviewer first thanks the contributor (L2-13), and shows his approve of this pull-request (L2-6) followed by a request of commit splitting (L2-12).
|
||||
In C2, the reviewer expresses that he agree with this change (L2-6) and thinks highly of this work (L2-14, :+1: is markdown grammar for emoji “thumb-up”) and finally he assigns other reviewers to ask for more advice (L2-11, ). Speaking of reviewer assignment, reviewers also delegate a code review if they are not familiar with the change under review.
|
||||
|
||||
\subsection{Mannual label}
|
||||
For each project, we randomly sample 200 pull-requests, whose comment count is greater than 0 and less than 20, per year from 2013 to 2015. Overall, 1,800 distinct pull-requests and 5645 review comments are sampled.
|
||||
|
||||
According to the taxonomy scheme, we manually classify the sampled review comments. To provide a convenient work environment and ensure the manual classification result, we build an Online Label Platform (OLP) to offer a web-based interface to reduce the extra burden. OLP is deployed on public network so that classification participant is free to do the label job in anywhere at any time when they are in their own groove.
|
||||
|
||||
%[!!!! 加个网页标注系统的图片!!!!]
|
||||
|
||||
Like Figure ***shows, the main interface of OLP, which organizes a pull-request and its review comment together, contains three sections:
|
||||
\begin{enumerate}
|
||||
\item \emph{Section 1}: In this section, the title, description and submitter of a pull-request will be displayed. In addition, its hyperlink on GitHub is also shown to provide convenience to jump to the original webpage for a more detailed view and inspection if needed.
|
||||
\item \emph{Section 2}: All the review comments of a pull-request are listed order by created time in this section. The author, author’s role (pull-request submitter or reviewer), comment type (inline comment or general comment) and crated time of a comment appears on the top of its content.
|
||||
\item \emph{Section 3}: Besides a comment, there are candidate labels corresponding 2-level categories, which can be selected by clicking the checkbox next to the label. Checkbox means multiple labels can be assigned to a comment. There is also an opportunity to report other labels that are not in our list.
|
||||
\end{enumerate}
|
||||
As shown, we only label each comment with 2-level categories and the 1-level categories is automatically labeled according to the taxonomy hierarchy. Table \ref{tab:label_sta} is the statistics of manual classification result.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Manual classification result}
|
||||
\begin{tabular}{r l c c c}
|
||||
\toprule
|
||||
\textbf{{1-level}} &\textbf{{2-level}} &\textbf{{Rails}} &\textbf{{Elasticsearch}} &\textbf{Angular.js}\\
|
||||
\midrule
|
||||
\multirow{6}*{L1-1}
|
||||
&L2-1 &0 &0 &0\\
|
||||
&L2-2 &0 &0 &0\\
|
||||
&L2-3 &0 &0 &0\\
|
||||
&L2-4 &0 &0 &0\\
|
||||
&L2-5 &0 &0 &0\\
|
||||
&L2-6 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-2}
|
||||
&L2-7 &0 &0 &0\\
|
||||
&L2-8 &0 &0 &0\\
|
||||
&L2-9 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-3}
|
||||
&L2-10 &0 &0 &0\\
|
||||
&L2-11 &0 &0 &0\\
|
||||
&L2-12 &0 &0 &0\\
|
||||
\midrule
|
||||
\multirow{3}*{L1-4}
|
||||
&L2-13 &0 &0 &0\\
|
||||
&L2-14 &0 &0 &0\\
|
||||
&L2-15 &0 &0 &0\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:label_sta}
|
||||
\end{table}
|
||||
|
||||
%%%%对表加点统计
|
||||
|
||||
\subsection{Automitical Classification}
|
||||
Based on manual classification result, we build \textbf{C}ombined \textbf{C}omment \textbf{C}lassifier (3\textbf{C}) which automatically classify review comments using rule-based and machine learning (ML) techniques. The ML based classification of review comments is performed using the scikit-learn\footnote{http://scikit-learn.org/}, in particular using Support Vector Machine algorithm (SVM). We already mentioned single review comment often address multiple topics, hence one of the goals of 3C is being able to perform multi-label classification. To this end, we construct rules and classifiers through one-vs-all strategy for each category from the high and low taxonomy.
|
||||
|
||||
\begin{figure*}[ht]
|
||||
\centering
|
||||
\includegraphics[width=16cm]{3c.png}
|
||||
\caption{Oveview of 3C}
|
||||
\label{fig:3c}
|
||||
\end{figure*}
|
||||
|
||||
Figure \ref{fig:3c} illustrates the overview of 3C. 3C is a multi-stage process containing two stages that make use of comment text and other features respectively.
|
||||
|
||||
\noindent\underline{\textbf{Stage 1:} }
|
||||
The classification in this stage mainly makes use of comment text and outputs a vector of possibility for each class(or catorgery).
|
||||
For example, \emph{CM} is a comment, the classification of \emph{CM} in stag 1 goes like this (suppose there are n classes \emph{$C_{1}, R_{2}, \dots, C_{n}$}):
|
||||
Firstly, each inspection rule (\emph{$R_{1}, R_{2}, \dots, R_{n}$}) wil be applied to CM. Assume \emph{$R_{i}$} is confirmed, the \emph{$i_{th}$} item in possibility vector would be set to 1 and text classifiers(SVM) \emph{$TC_{1}, TC_{2}, TC_{i-1}, \dots,TC_{i+1}\dots, TC_{n}$} will be used to predict the posibilities for classes \emph{$C_{1}, C_{2}, C_{i-1}, \dots,C_{i+1}\dots, C_{n}$} respectively. Finally, the possibility vector is outputed. The following are some key steps in this stage.
|
||||
|
||||
\noindent\textbf{Rule inspection:} Given a review coment to be classified, 3C first looks through its original text to apply each inspection rule iteratively.
|
||||
As said, each inpection rule corresponds to a specific class, so a class label will be assigned to a comment if it's insepction rule has been confirmed.
|
||||
Inspectin rule is a set of matching patterns containing discriminating term(s) and regular expressions. These recognization effect of matching patterns is \hl{not balaced} which means high recall results in low precision and high precision results in low recall. For this reason, our inspection rules are constructed \hl{following this setting}: they try to recongnize more comments at a relatively high precision and leave the \hl{left unrecongnized comment} to ML text classifiers.That is saying recognization result by inspection rule is pretty reliable.
|
||||
!!!!! take an example !!!!!
|
||||
|
||||
Before processed by ML text classifiers, comment text should be preprocessed which consist of some specific transformations and feature extraction.
|
||||
|
||||
\noindent\textbf{Text transformations:}
|
||||
Reviewer tend to reference source code, hyperlink and others' talk in review comment to clearly express their opinion, prove their point and reply other people.
|
||||
These behaviours do have promoted the review process.But on the other hand, this brings great challenge for comment classification at the same time.
|
||||
Words in these \hl{text block} contribute a little to the classification but even introduce interference. Hence, we transforme them into \hl{single-word indicators} to reduce vocabulary interference and reserve reference information.
|
||||
!!!!! take an example ??!!!!!
|
||||
|
||||
\noindent\textbf{Feature extraction:}
|
||||
Transformed text would be tokenized and stemmed to root form\cite{Porter:1997}. \hl{We filter out puncations from word tokens and don't remove English stop words for we think some commen words are import in review comments. For example \dots.} To apply ML algorithm in comment classification, we need to extract a set of features from the comment text. We adpoted tf-idf model[] which can be conveniently computed with tools provdid by Scikit-learn.
|
||||
|
||||
\noindent\underline{\textbf{Stage 2:}} The classification in this stage is based on composed features.In addition to possibility vector generated in \emph{stage 1}, we also consider other comment level feartures listed in Table \ref{tab:factors}
|
||||
|
||||
\noindent\textbf{Comment\_length:} total number of character contained in comment text after preprocessing. Long comment are likely to \hl{argue about pull-reqeust appropriatness and code improvment.}
|
||||
|
||||
\noindent\textbf{Comment\_type:} binary, indicating whether a comment is inline comment or issue comment.Inline comment tends to talk about solution detail while general comment talks more about other \hl{``high-level'' matters.}
|
||||
|
||||
\noindent\textbf{Link\_inclusion:} binary, indicating if a comment includes hyperlinks. Hyperlinks are usually used to provide evidence when someone insists on some point of view or to offer guidlines when they wants to help other people.
|
||||
|
||||
\noindent\textbf{Ping\_inclusion:} binary, indicating if a comment includes ping activity (occurring by the form of ``@ username'' ).
|
||||
|
||||
\noindent\textbf{Code\_inclusion:} binary, indicating if a comment includes source code.Solution detail related comments tend to contain soure codes.
|
||||
|
||||
\noindent\textbf{Ref\_inclusion:} binary, indicating if a comment includes reference of others talk. Reference of others' talk indicates a reply to someone, which probably reflects ones' further suggestion or disagreement.
|
||||
|
||||
\noindent\textbf{sim\_pr\_title:} similarity between text of comment content and text of pull-request title (measured by number of common word divided by the number of union word).
|
||||
|
||||
\noindent\textbf{Sim\_pr\_desc:} similarity between text of comment content and text of pull-request description(measured like how sim\_pr\_title is computed).
|
||||
Comments with high similarity with the title or description of pull-request are likely to discuss solution detail or value of the change have brought.
|
||||
|
||||
Together with the possibility vector passed from \emph{stage 1}, these features are composed to form new feature vector to be processed by prediction models. Like \emph{stage 1}, there is \emph{n} binary predict model for each category. In prediction models, a new possibility vector is generated to represent \hl{how likely a comment falls into a specific category}. After iterating the possibility vector, a comment will be labeled with class $C_{i}$ if the $i_{th}$ vector item is greater than 0.5. If all the items of possibility vector are less than 0.5, the class label corresponding to the largest one will be assigned to the comment. Finally, each comment processedby 3\textbf{C} will be marked with at least one class label.
|
||||
\subsection{Evaluation of 3C}
|
||||
The performance of 3\textbf{C}is evaluated using a 10-fold cross validation, i.e., splitting review comments into 10 sets of which 9 sets is used to train the classifiers and the ramaining one set is for performance test, and repeating this process to 10 times.
|
||||
|
||||
Tabel \ref{tab:auto_class} reports the precision and recall of classification result \hl{on 1-level categories}.
|
||||
|
||||
%和 只基于rule以及 只 基于ML的作对比 Pre. Recall F,以及平均值
|
||||
|
||||
|
||||
|
||||
%%%%表中用大类的名字
|
||||
% \textbf{Solution correctness} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
% \textbf{PR appropriateness} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
% \textbf{Project management} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
% \textbf{Social interaction} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
%%%%表中用大类的名字
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{classification performance of 3C}
|
||||
\begin{tabular}{r c c c c c c c c}
|
||||
\toprule
|
||||
\multirow{2}*{\textbf{Class}}
|
||||
&\multicolumn{2}{c}{\textbf{Rails}} &\multicolumn{2}{c}{\textbf{Elasticsearch}} &\multicolumn{2}{c}{\textbf{Angular.js}} &\multicolumn{2}{c}{\textbf{Ave.}}\\
|
||||
&Prec.&Rec. &Prec.&Rec. &Prec.&Rec. &Prec.&Rec.\\
|
||||
\midrule
|
||||
\textbf{L1-1} &0.83 &0.81 &0.83 &0.85 &0.77 &0.74 &0.81 &0.80\\
|
||||
\textbf{L1-2} &0.71 &0.70 &0.88 &0.86 &0.79 &0.83 &0.79 &0.80\\
|
||||
\textbf{L1-3} &0.85 &0.74 &0.75 &0.71 &0.81 &0.69 &0.80 &0.71\\
|
||||
\textbf{L1-4} &0.95 &0.95 &0.95 &0.89 &0.92 &0.87 &0.94 &0.90\\
|
||||
\midrule
|
||||
\textbf{Ave.} &0.83 &0.80 &0.85 &0.83 &0.82 &0.78 &\textbf{0.83} &\textbf{0.80}\\
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:auto_class}
|
||||
\end{table}
|
||||
|
||||
\section{EXPERIMENT \& RESULT}
|
||||
|
||||
\subsection{RESEARCH QUESTIONS}
|
||||
|
||||
|
||||
\subsection{statistics}
|
||||
%各种类型的comment在pr中的分布
|
||||
|
||||
|
||||
\subsection{co-occure}
|
||||
%各种comment的同现情况
|
||||
\subsection{factors}
|
||||
|
||||
\begin{table*}[ht]
|
||||
\centering
|
||||
\caption{Dataset of our experiments}
|
||||
\begin{tabular}{r c c c c c c c}
|
||||
\toprule
|
||||
\textbf{Statistic} &\textbf{Mean} &\textbf{St.} &\textbf{Dev.} &\textbf{Min} &\textbf{Median} &\textbf{Max} &\textbf{Histogram}\\
|
||||
\midrule
|
||||
\textbf{Project level metrics}\hspace{1em} \\
|
||||
Project\_age & & & & & & & \\
|
||||
Team\_size & & & & & & & \\
|
||||
Pr\_growth & & & & & & & \\
|
||||
\midrule
|
||||
\textbf{Pull-request level metrics}\hspace{1em} \\
|
||||
Churn & & & & & & & \\
|
||||
N\_files & & & & & & & \\
|
||||
Src\_touched & & & & & & & \\
|
||||
config\_ touched & & & & & & & \\
|
||||
test\_touched & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Contributor level metrics}\hspace{1em} \\
|
||||
core team & & & & & & & \\
|
||||
Prev\_prs\_local & & & & & & & \\
|
||||
Prev\_prs\_global & & & && & & \\
|
||||
\midrule
|
||||
\textbf{Comment level metrics}\hspace{1em} \\
|
||||
Coment\_length & & & && & & \\
|
||||
Coment\_type & & & && & & \\
|
||||
sim\_pr\_title & & & && & & \\
|
||||
Sim\_pr\_desc & & & && & & \\
|
||||
Code\_inclusion & & & && & & \\
|
||||
Ref\_inclusion & & & && & & \\
|
||||
Link\_inclusion & & & && & & \\
|
||||
Ping\_inclusion & & & && & & \\
|
||||
|
||||
\bottomrule
|
||||
. \end{tabular}
|
||||
\label{tab:factors}
|
||||
\end{table*}
|
||||
|
||||
|
||||
%各种factor对消极comment的影响
|
||||
\noindent\underline{\textbf{Project level}}
|
||||
|
||||
\noindent\textbf{Project\_age:} the time from project creation on GitHub to the pull-request creation in months, which is used as a proxy for maturity.
|
||||
|
||||
\noindent\textbf{Team\_size:} the number of active integrators, who decide whether to accept the new contributions, during the three months prior to pull-request creation.
|
||||
|
||||
\noindent\textbf{Pr\_growth:} Average number of pull requests submitted per day, prior to the examined pull request.
|
||||
|
||||
\noindent\underline{\textbf{Pull-request level}}
|
||||
|
||||
\noindent\textbf{Churn:} total number of lines added and deleted by the pull-request. Bigger code changes may be more complex, and require longer code evaluation.
|
||||
|
||||
\noindent\textbf{N\_files:} total number of files changed in the pull-request, which is a signal of pull-request size.
|
||||
|
||||
\noindent\textbf{Src\_touched:} binary, measuring if the pull-request touched at least one source code file.(documentation 看成是source code file)
|
||||
|
||||
\noindent\textbf{Config\_touched:} binary, measuring if the pull-request touched at least one configure file.
|
||||
|
||||
\noindent\textbf{Test\_touched:} binary, measuring if the pull-request touched at least one test file.
|
||||
|
||||
\noindent\underline{\textbf{Contributor level}}
|
||||
|
||||
\noindent\textbf{Core\_team:} binary core\_contributor, outside\_contributor, binary, whether the submitter is member of core development team.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_local:} Number of pull requests submitted to a specific project by a developer, prior to the examined pull request.
|
||||
|
||||
\noindent\textbf{Prev\_prs\_global:} Number of pull requests submitted by a developer in Github, prior to the examined pull request.
|
||||
|
||||
\section{THREATS TO VALIDITY}
|
||||
%人工标注:
|
||||
%自动化分类:
|
||||
\section{RELATED WORK}
|
||||
\subsection{Code review}
|
||||
%(参考泰国人的那个)
|
||||
%传统的
|
||||
%现代的
|
||||
\subsection{Pull-request}
|
||||
%(参考大师兄的)
|
||||
%对pullt-request的一些研究
|
||||
|
||||
\subsection{Classification on collaboratively generated data}
|
||||
%(08年的和移动应用评论那篇)
|
||||
%其他研究对issue 的分类
|
||||
|
||||
\section{Conclusion}
|
||||
The conclusion goes here. this is more of the conclusion
|
||||
|
||||
% conference papers do not normally have an appendix
|
||||
|
||||
|
||||
% use section* for acknowledgement
|
||||
\section*{Acknowledgment}
|
||||
The research is supported by the National Natural Science Foundation of China (Grant No.61432020, 61472430, 61502512) and National Grand R\&D Plan (Grant No. 2016YFB1000805).
|
||||
%再加一个基金号 ,王老师给的那个
|
||||
|
||||
The authors would like to thank...
|
||||
more thanks here
|
||||
|
||||
|
||||
\begin{thebibliography}{1}
|
||||
|
||||
\bibitem{Yu:2015}
|
||||
Yu, Y., Wang, H., Yin, G., \& Wang, T. (2015). Reviewer Recommendation for Pull-Requests in GitHub : What Can We Learn from Code Review and Bug Assignment ?
|
||||
|
||||
\bibitem{Barr:2012}
|
||||
E. T. Barr, C. Bird, P. C. Rigby, A. Hindle, D. M. German, P. Devanbu, Cohesive and isolated development with branches, in: FASE, Springer, 2012, pp. 316–331.
|
||||
|
||||
\bibitem{Gousios:2014}
|
||||
G. Gousios, M. Pinzger, A. v. Deursen, An exploratory study of the pull-based software development model, in: ICSE, ACM, 2014, pp. 345–355.
|
||||
|
||||
\bibitem{Vasilescu:B}
|
||||
Vasilescu, B., Yu, Y., Wang, H., Devanbu, P., \& Filkov, V. (n.d.). Quality and Productivity Outcomes Relating to Continuous Integration in GitHub.
|
||||
|
||||
\bibitem{Tsay:2014a}
|
||||
Tsay, J., Dabbish, L., \& Herbsleb, J. (2014). Let’s talk about it: evaluating contributions through discussion in GitHub. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, 144–154.
|
||||
|
||||
\bibitem{Gousios:2014b}
|
||||
Georgios, G., \& Bacchelli, A. (2014). Work Practices and Challenges in Pull--Based Development: The Contributor`s Perspective. Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, (ICIS--R15001), 285–296. http://doi.org/10.1109/ICSE.2015.55.
|
||||
|
||||
\bibitem{Marlow:2013}
|
||||
Marlow, J., Dabbish, L. and Herbsleb, J. 2013. Impression formation in online peer production: activity traces and personal profiles in github. Proceedings of the 2013 conference on Computer supported cooperative work (New York, NY, USA, 2013), 117–128.
|
||||
|
||||
\bibitem{Veen:2015}
|
||||
Veen E V D, Gousios G, Zaidman A. Automatically Prioritizing Pull Requests[C]// Mining Software Repositories. IEEE, 2015:357-361.
|
||||
|
||||
\bibitem{Thongtanunam:2015}
|
||||
[9] Thongtanunam P, Tantithamthavorn C, Kula R G, et al. Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review[J]. 2015:141-150.
|
||||
|
||||
\bibitem{jiang:2015}
|
||||
% 蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
蒋竞, 贺佳欢, 陈学渊. CoreDevRec:Automatic Core Member Recommendation for Contribution Evaluation[J]. Journal of Computer Science and Technology, 2015, 30(5):998-1016.
|
||||
|
||||
\bibitem{Tsay:2014b}
|
||||
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub[C]// ICSE. 2014:356-366.
|
||||
|
||||
\bibitem{Bacchelli:2013}
|
||||
A. Bacchelli and C. Bird. Expectations, outcomes, and challenges of modern code review. In Proceedings of the 2013 International Conference on Software Engineering, pages 712–721. IEEE Press, 2013.
|
||||
|
||||
\bibitem{Shah:2006}
|
||||
Shah, S. K. (2006). Motivation, Governance, and the Viability of Hybrid Forms in Open Source Software Development. Management Science, 52(7), 1000–1014. http://doi.org/10.1287/mnsc.1060.0553
|
||||
|
||||
\bibitem{Porter:1997}
|
||||
M. F. Porter, ``Readings in information retrieval'' K. Sparck Jones and P.Willett, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1997, ch. An Algorithm for Suffix Stripping, pp. 313–316. [Online]. Available: http://dl.acm.org/citation.cfm?id=275537.275705
|
||||
|
||||
\bibitem{Yu:2014}
|
||||
Yu, Y., Wang, H., Yin, G., \& Ling, C. X. (2014). Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 1, 335–342. http://doi.org/10.1109/APSEC.2014.57
|
||||
|
||||
\bibitem{Thongtanunam:2016}
|
||||
Thongtanunam, P., Mcintosh, S., Hassan, A. E., \& Iida, H. (2016). Review participation in modern code review An empirical study of the android , Qt , and OpenStack projects. Empirical Software Engineering. http://doi.org/10.1007/s10664-016-9452-6
|
||||
|
||||
\bibitem{Beller:2014}
|
||||
Beller, M., Bacchelli, A., Zaidman, A., \& Juergens, E. (2014). Modern code reviews in open-source projects: which problems do they fix? Proceedings of the 11th Working Conference on Mining Software Repositories - MSR 2014, 202–211. http://doi.org/10.1145/2597073.2597082
|
||||
|
||||
\bibitem{Fagan:1999}
|
||||
M. E. Fagan. Design and Code Inspections to Reduce Errors in Program Development. IBM System Journal, 38(2-3):258–287, jun 1999.
|
||||
|
||||
\bibitem{Kollanus:2009}
|
||||
S. Kollanus and J. Koskinen. Survey of software inspection research. The Open Software Engineering Journal, 3(1):15–34, 2009.
|
||||
|
||||
\end{thebibliography}
|
||||
|
||||
|
||||
|
||||
|
||||
% that's all folks
|
||||
%\end{CJK*}
|
||||
\end{document}
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,29 @@
|
|||
|
||||
|
||||
\begin{abstract}
|
||||
Pull-based model, integrated in code hosting sites such as GitHub and Bitbucket, has been widely used in distributed software development.
|
||||
It allows any user to contribute to a public repository and lows barrier to the contribution entry, which results in the high volume of incoming pull-requests. Before get accepted or rejected, a contribution is usually discussed among integrators, contributors and community audience.
|
||||
Discussion participants comment on a contribution from different perspectives, such as contribution appropriateness, code quality and programming convention.
|
||||
While Github presents and orders all comments on the main pull-request overview by created time, automatically recognizing and classifying what a comment is talking about can benefit contribution evaluation, reviewer recommendation, pull-request prioritization and other pull-request related tasks.
|
||||
Hence, it is a significant work to study the classification of comments on pull-request.
|
||||
|
||||
|
||||
% pr常常不是完美的,会引入issue,而哪些因素会引起 哪些issue没有被较好的理解
|
||||
% 该用comments 还是 review comments
|
||||
|
||||
|
||||
In this paper, we explore and summarize \hl{six type of comments} by manual identification and present a method for automatic classification of comments. We used our approach to analyze pull-request comments from \hl{five} projects in Github and the evaluation results show that our method can works effectively.
|
||||
|
||||
|
||||
%[应该再详细介绍基于什么方法,和实验结果]
|
||||
%1. Pull-request很流行
|
||||
%2. 贡献多,需评估,讨论多
|
||||
%3. 讨论的关注点不同
|
||||
%4. 对讨论进行分类的意义重大
|
||||
%5. 我们发现了几种类别,并提出了分类方法
|
||||
|
||||
|
||||
\end{abstract}
|
||||
|
||||
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 30 KiB |
Binary file not shown.
After Width: | Height: | Size: 51 KiB |
Binary file not shown.
After Width: | Height: | Size: 28 KiB |
|
@ -0,0 +1,34 @@
|
|||
\documentclass[10pt, conference, compsocconf]{IEEEtran}
|
||||
% correct bad hyphenation here
|
||||
%\usepackage{CJK}
|
||||
\hyphenation{op-tical net-works semi-conduc-tor}
|
||||
\newcommand{\tabincell}[2]{\begin{tabular}{@{}#1@{}}#2\end{tabular}}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{color,soul}
|
||||
\usepackage{colortbl}
|
||||
\usepackage{xcolor}
|
||||
\usepackage{enumitem}
|
||||
\usepackage{threeparttable}
|
||||
\usepackage{multirow,booktabs}
|
||||
|
||||
|
||||
\begin{document}
|
||||
%\begin{CJK*}{GBK}{song}
|
||||
|
||||
\title{Understanding the Review Comments on Pull-request}
|
||||
|
||||
\author{
|
||||
\IEEEauthorblockN{Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Huaimin Wang}
|
||||
\IEEEauthorblockA{National Laboratory for Parallel and Distributed Processing, College of Computer\\
|
||||
National University of Defense Technology\\
|
||||
Changsha, China\\
|
||||
\{lizhixing15, yuyue, yingang, taowang2005, hmwang\}@nudt.edu.cn }
|
||||
}
|
||||
|
||||
\maketitle
|
||||
|
||||
\input{abstract}
|
||||
|
||||
\end{document}
|
||||
|
||||
|
Loading…
Reference in New Issue