This commit is contained in:
starlee 2017-10-12 19:57:44 +08:00
parent 00dca29521
commit 5f6eeaf735
5 changed files with 99 additions and 95 deletions

View File

@ -9,7 +9,7 @@ His work interests include open source software engineering, data mining, and kn
is an associate professor in the College of Computer at National University of Defense Technology.
He received his Ph. D. degree in Computer Science from National University of Defense Technology (NUDT) in 2016.
He has visited UC Davis supported by CSC scholarship.
His research findings have been published on MSR, FSE, IST, ICSME APSEC and SEKE.
His research findings have been published on MSR, FSE, IST, and ICSME.
His current research interests include software engineering, spanning from mining software repositories
and analyzing social coding networks.

View File

@ -60,10 +60,13 @@ based on an investigation of developers and projects at Microsoft.
They found that although defect finding is still the most motivation in code reviews,
defect related comments comprise a small proportion.
Their study focused on commercial projects and
the reviewers of the these projects are mainly the employees at Microsoft.
It is very different from the projects using pull-based model in GitHub,
which exist in a more transparent and public environment
and the code reviewers of such projects come from the community.
the reviewers of these projects are mainly the employees at Microsoft.
However, projects using pull-based model in GitHub are developed in a more transparent and open environment
and code reviews are executed by the community users.
Community reviewers may serve for different organizations, use the project for various purposes,
and hold specific consideration in code review processes.
Therefore different review topics may exist in pull-based development model
compared to those in commercial development model.
Tsay \etal~\cite{Tsay:2014a} explored issues raised around code contributions in GitHub.
They reported that reviewers discuss on both
the appropriateness of the contribution proposal and the correctness of the implemented solution.

View File

@ -368,77 +368,6 @@ This binary feature refers to if a comment includes ping activity
(occurring by the form of ``@ user-name'').
\begin{table*}[ht]
\scriptsize
\caption{Complete taxonomy}
\begin{tabular}{r r p{9.8cm}}
% \toprule
\hline
\rowcolor[HTML]{000000}
&&\\
\rowcolor[HTML]{000000}
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-1 Categories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-2 Subcategories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Description \& Example}}}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{Code Improvement}\\(L1-1)}}
&\cellcolor{lightgray} &\cellcolor{lightgray}Points out extra blank line, improper indention, inconsistent naming convention, etc. \\
&\cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Style Checking\\(L2-1)}} &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
&\multirow{3}*{\tabincell{r}{Defect Detecting\\(L2-2)}} & Figures out runtime program errors or evolvability defects and etc.\\
& &e.g., \emph{``''he default should be `false`''} and \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Demands submitter to provide test case for changed codes, report test result, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Code Testing\\(L2-3)}} &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{PR Decision-making}\\(L1-2)}}
&\multirow{2}*{\tabincell{r}{Value Affirming\\(L2-4)}} &Satisfied with the pull-request and agree to merge it\\
& &e.g., \emph{``PR looks good to me. Can you \dots''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Rejects to merge the pull-request for duplicate proposal, undesired feature, etc.\\
& \cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Solution Disagreeing\\(L2-5)}} & \cellcolor{lightgray} e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
&\multirow{3}*{\tabincell{r}{Further Questioning\\(L2-6)}} & Confused with the purpose of the pull-request and ask for more details or use cases\\
& &e.g., \emph{``Can you provide a use case for this change?''}\\
%%%%%%
% \midrule
\hline
\multirow{6}*{\tabincell{r}{\textbf{Project Management}\\(L1-3)}}
&\cellcolor{lightgray} &\cellcolor{lightgray} States what type of changes a specific version is expected to merge, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Roadmap Managing\\(L2-7)}} &\cellcolor{lightgray}e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
&\multirow{2}*{\tabincell{r}{Reviewer Assigning\\(L2-8)}} &Ping other one(s) to review this pull-request\\
& &e.g., \emph{``/cc @fxn can you take a look please?''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Needs to squash or rebase the commits, formulate the message, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Convention Checking\\(L2-9)}} &\cellcolor{lightgray}e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
%%%%%%
% \midrule
\hline
\multirow{4}*{\tabincell{r}{\textbf{Social Interaction}\\(L1-4)}}
&\multirow{3}*{\tabincell{r}{Politely Responding\\(L2-10)}} & Thanks for what other people do, apologize for mistakes, etc.\\
& &e.g.,\emph{``Thank you. This feature was already proposed and it was rejected. See \#xxx''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray} Agrees with others' opinion, compliments others' work, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Contribution Encouraging\\(L2-11)}} &\cellcolor{lightgray}e.g.,\emph{``:+1: nice one @cristianbica.''}\\
%%%%%%
% \midrule
\hline
\textbf{Others} &\multicolumn{2}{l}{Short sentence without clear or exact meaning like \textit{``@xxxx will do''} and \textit{``The same here :)''}} \\
\bottomrule
\multicolumn{3}{l}{\emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji in GitHub}}
\end{tabular}
\label{tab:taxonomy}
\end{table*}
\texttt{\small{Code\_inclusion:}}
% \textit{Code\_inclusion:}
This binary feature denotes if a comment includes the code elements.
@ -485,5 +414,5 @@ based on this automatically classified dataset.
In total, we classified 147,367 comments of 27,339 pull-requests.
Based on this dataset, we conducted a preliminary quantitative study.
We first explored the distribution of review comments on each category
and then we reported some of the interesting findings derived from the distribution.
and then we reported some of the findings derived from the distribution.

View File

@ -3,6 +3,77 @@
\subsection{RQ1: What is the taxonomy for review comments on pull-requests?}
\begin{table*}[ht]
\scriptsize
\caption{Complete taxonomy}
\begin{tabular}{r r p{9.8cm}}
% \toprule
\hline
\rowcolor[HTML]{000000}
&&\\
\rowcolor[HTML]{000000}
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-1 Categories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Level-2 Subcategories}}} &
\multirow{-2}*{{\color[HTML]{FFFFFF}\textbf{Description \& Example}}}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{Code Improvement}\\(L1-1)}}
&\cellcolor{lightgray} &\cellcolor{lightgray}Points out extra blank line, improper indention, inconsistent naming convention, etc. \\
&\cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Style Checking\\(L2-1)}} &\cellcolor{lightgray}e.g., \emph{``scissors: this blank line''}\\
&\multirow{3}*{\tabincell{r}{Defect Detecting\\(L2-2)}} & Figures out runtime program errors or evolvability defects and etc.\\
& &e.g., \emph{``''he default should be `false`''} and \emph{``let's extract this into a constant. No need to initialize it on every call.''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Demands submitter to provide test case for changed codes, report test result, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Code Testing\\(L2-3)}} &\cellcolor{lightgray}e.g., \emph{``this PR will need a unit test, I'm afraid, before it can be merged.''}\\
\hline
\multirow{6}*{\tabincell{r}{\textbf{PR Decision-making}\\(L1-2)}}
&\multirow{2}*{\tabincell{r}{Value Affirming\\(L2-4)}} &Satisfied with the pull-request and agree to merge it\\
& &e.g., \emph{``PR looks good to me. Can you \dots''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Rejects to merge the pull-request for duplicate proposal, undesired feature, etc.\\
& \cellcolor{lightgray}\multirow{-3}*{\tabincell{r}{Solution Disagreeing\\(L2-5)}} & \cellcolor{lightgray} e.g., \emph{``I do not think this is a feature we'd like to accept. \dots''}\\
&\multirow{3}*{\tabincell{r}{Further Questioning\\(L2-6)}} & Confused with the purpose of the pull-request and ask for more details or use cases\\
& &e.g., \emph{``Can you provide a use case for this change?''}\\
%%%%%%
% \midrule
\hline
\multirow{6}*{\tabincell{r}{\textbf{Project Management}\\(L1-3)}}
&\cellcolor{lightgray} &\cellcolor{lightgray} States what type of changes a specific version is expected to merge, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Roadmap Managing\\(L2-7)}} &\cellcolor{lightgray}e.g., \emph{``Closing as 3-2-stable is security fixes only now''}\\
&\multirow{2}*{\tabincell{r}{Reviewer Assigning\\(L2-8)}} &Ping other one(s) to review this pull-request\\
& &e.g., \emph{``/cc @fxn can you take a look please?''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray}Needs to squash or rebase the commits, formulate the message, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Convention Checking\\(L2-9)}} &\cellcolor{lightgray}e.g., \emph{``\dots Can you squash the two commits into one? \dots''}\\
%%%%%%
% \midrule
\hline
\multirow{4}*{\tabincell{r}{\textbf{Social Interaction}\\(L1-4)}}
&\multirow{3}*{\tabincell{r}{Politely Responding\\(L2-10)}} & Thanks for what other people do, apologize for mistakes, etc.\\
& &e.g.,\emph{``Thank you. This feature was already proposed and it was rejected. See \#xxx''}\\
&\cellcolor{lightgray} &\cellcolor{lightgray} Agrees with others' opinion, compliments others' work, etc.\\
&\cellcolor{lightgray}\multirow{-2}*{\tabincell{r}{Contribution Encouraging\\(L2-11)}} &\cellcolor{lightgray}e.g.,\emph{``:+1: nice one @cristianbica.''}\\
%%%%%%
% \midrule
\hline
\textbf{Others} &\multicolumn{2}{l}{Short sentence without clear or exact meaning like \textit{``@xxxx will do''} and \textit{``The same here :)''}} \\
\bottomrule
\multicolumn{3}{l}{\emph{Note: word beginning and ending with colon like ``:scissors:'' is a markdown grammar for emoji in GitHub}}
\end{tabular}
\label{tab:taxonomy}
\end{table*}
Table \ref{tab:taxonomy} shows the complete taxonomy.
We identified four Level-1 categories,
namely \textit{Code Correctness (L1-1)},
@ -65,13 +136,20 @@ its large-scale sampling, thorough discussion and multiple validation.
Therefore, we exclude them from our analysis and do not adjust our taxonomy.
\begin{framed}
\noindent
\textbf{RQ1:} {}
\textit{From the qualitative study, we identified a two-level taxonomy for
\textbf{In Summary.}
From the qualitative study, we identified a two-level taxonomy for
review comments which consists of 4 categories in Level 1
and 11 sub-categories in Level 2.}
\end{framed}
and 11 sub-categories in Level 2.
% \begin{framed}
% \noindent
% \textbf{RQ1:} {}
% \textit{From the qualitative study, we identified a two-level taxonomy for
% review comments which consists of 4 categories in Level 1
% and 11 sub-categories in Level 2.}
% \end{framed}
\subsection{RQ2: Is it possible to automatically classify
review comments according to the defined taxonomy}
@ -414,18 +492,16 @@ it is not sufficient to differentiate the two types of comments.
We plan to address the issue by extending the manually labeled dataset
and introducing a sentiment analysis.
\begin{framed}
\noindent
\textbf{RQ2:}
\textit{
\textbf{In Summary.}
Compared with the baseline,
\TSHC achieves higher performance
in terms of the weighted average F-measure, namely
0.78 in Rails, 0.82 in Elasticsearch,
and 0.75 in Angular.js,
which indicates \TSHC is applicable
in practical automatic classification.}
\end{framed}
in practical automatic classification.
@ -478,7 +554,7 @@ which has the larger data size and therefore makes our findings more convincing.
\subsubsection{Cost-free Comments are Less frequent }
\subsubsection{Cost-free comments are less frequent }
It is strange that although inspecting code style (L2-1)
and testing code (L2-3) cost reviewers very few effort,
which do not require too much understanding for code changes,
@ -588,7 +664,7 @@ This reality justifies the necessary of manual code review
even although an increasing number of automatic tools are coming into being.
\subsubsection{Convention are not always maintained}
\subsubsection{Convention are sometimes overlooked}
\begin{figure*}[htbp]
\centering
@ -601,7 +677,7 @@ even although an increasing number of automatic tools are coming into being.
Also most projects set specific development conventions
which do not require too much effort to maintain,
they are, somehow, ignored by contributors which can be seen from the following examples.
they are, somehow, overlooked by contributors which can be seen from the following examples.
\textbf{Example\_1:}
@ -660,14 +736,10 @@ Another alternative solution for GitHub is to provide automatic reviewing tools
that can be configured with predefined convention rules
and triggered as a new pull-request is submitted.
\begin{framed}
\noindent
\textbf{RQ3:}
\textit{
\textbf{In Summary.}
Most comments are discussing about code correcting and social interactions.
External contributors are more likely to
break project conventions in their early contributions,
and their pull-requests might contain potential issues,
even though they have passed the tests.
}
\end{framed}

Binary file not shown.