minor refinement
This commit is contained in:
parent
eeb3d63b80
commit
640a64eb2a
369
4_rq2.tex
369
4_rq2.tex
|
@ -568,6 +568,95 @@ we identify \hl{**} metrics that can be computed at pull request submission t
|
|||
The identified metrics are classified into the following three categories:
|
||||
|
||||
|
||||
|
||||
|
||||
\vspace{0.5em}
|
||||
\noindent\textbf{Project-level characteristics.}
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Maturity.}
|
||||
Previous studies used the metric \texttt{proj\_age},
|
||||
\ie the period of time from the time the project was hosted on GitHub to the pull request submission time,
|
||||
as an indicator of the project maturity~\cite{Tsay2014Influence,yu16det,Rahman2014An}.
|
||||
However,
|
||||
a project does not necessarily use the pull request model in the first place.
|
||||
We also use the metric \texttt{prmodel\_age} to indicate
|
||||
how long a project has adopted the pull request development model.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Workload.}
|
||||
The discussion of issues and pull requests might cost days to months to come to an end.
|
||||
At any given time,
|
||||
a bunch of open issues and pull requests
|
||||
might be discussed simultaneously.
|
||||
Prior studies have characterized project integrators' workload using two metrics:
|
||||
\texttt{open\_tasks}~\cite{yu16det} and \texttt{team\_size}~\cite{Tsay2014Influence,Gousios:2014,yu16det},
|
||||
which are the number of open issues and open pull requests at the pull request submission and
|
||||
the number of active core team members during the last three months, respectively.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Popularity.}
|
||||
In measuring project popularity,
|
||||
the metric \texttt{stars},
|
||||
\ie the number of stars the project has got,
|
||||
was commonly used in prior studies~\cite{Bor17Und,Tsay2014Influence}.
|
||||
In addition,
|
||||
we also considered three other popularity-related metrics:
|
||||
\texttt{forks}, \texttt{prs}, and \texttt{contributors},
|
||||
which are the number of forks, the number of pull requests, and the number of contributors
|
||||
of the project, respectively.
|
||||
|
||||
|
||||
% \vspace{0.2em}
|
||||
% \textit{Activeness.}
|
||||
% We further measure the activeness of the project
|
||||
% by including four metrics:
|
||||
% \texttt{stars\_3M}, \texttt{forks\_3M}, \texttt{prs\_3M}, and \texttt{contributors\_3M},
|
||||
% which are the new stars, new forks, new pull requests, and active contributors in the project in the last three months, respectively.
|
||||
|
||||
|
||||
|
||||
\vspace{0.5em}
|
||||
\noindent\textbf{Submitter-level characteristics.}
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Experience.}
|
||||
Developers' experience before they submit the pull request has been analyzed in prior studies~\cite{Gousios:2014,jiang2013will}.
|
||||
This measure can be computed from two perspectives:
|
||||
project-level experience and community-level experience.
|
||||
The former measures the number of previous pull requests
|
||||
that have submitted to a specific project (\texttt{prev\_prs\_proj}) and their acceptance rate (\texttt{prev\_acc\_proj}).
|
||||
The latter measures the number of previous pull requests
|
||||
that have been submitted to GitHub (\texttt{prev\_prs}) and their acceptance rate (\texttt{prev\_acc}).
|
||||
When calculating acceptance rate,
|
||||
the determination of whether the pull request was integrated
|
||||
through other mechanisms than GitHub's merge button follows the heuristics defined in previous studies~\cite{Gousios:2014,zhou2019fork}.
|
||||
We also use two metrics \texttt{first\_pr\_proj} and
|
||||
\texttt{first\_pr} to represent whether the pull request is the first one submitted by a developer to a specific project and GitHub, respectively.
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Standing.}
|
||||
A dichotomous metric \texttt{core\_team},
|
||||
which indicates whether the pull request submitter is the core team member of the project,
|
||||
was commonly used as a signal of the developer's standing within the project~\cite{Tsay2014Influence,yu16det}.
|
||||
Furthermore,
|
||||
a continuous metric \texttt{followers},
|
||||
\ie the number of GitHub users that are following the pull request submitter,
|
||||
was used to represent the developers' standing within the community~\cite{Tsay2014Influence,Gousios:2014,yu16det}.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Previous interaction.}
|
||||
This metric (\texttt{prev\_interaction}) is the total number of events,
|
||||
\eg such as commenting on issues and pull requests,
|
||||
prior to the pull request submission
|
||||
that the developer has participated in within the project~\cite{Tsay2014Influence,yu16det}.
|
||||
|
||||
|
||||
|
||||
|
||||
\vspace{0.5em}
|
||||
\noindent\textbf{Patch-level characteristics.}
|
||||
|
||||
|
@ -621,89 +710,6 @@ and \textit{Doc} changing documentation files.
|
|||
This metric (\texttt{activity\_type}) is determined by checking the names and extensions of changed files.
|
||||
|
||||
|
||||
\vspace{0.5em}
|
||||
\noindent\textbf{Submitter-level characteristics.}
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Experience.}
|
||||
Developers' experience before they submit the pull request has been analyzed in prior studies~\cite{Gousios:2014,jiang2013will}.
|
||||
This measure can be computed from two perspectives:
|
||||
project-level experience and community-level experience.
|
||||
The former measures the number of previous pull requests
|
||||
that have submitted to a specific project (\texttt{prev\_prs\_proj}) and their acceptance rate (\texttt{prev\_acc\_proj}).
|
||||
The latter measures the number of previous pull requests
|
||||
that have been submitted to GitHub (\texttt{prev\_prs}) and their acceptance rate (\texttt{prev\_acc}).
|
||||
When calculating acceptance rate,
|
||||
the determination of whether the pull request was integrated
|
||||
through other mechanisms than GitHub's merge button follows the heuristics defined in previous studies~\cite{Gousios:2014,zhou2019fork}.
|
||||
We also use two metrics \texttt{first\_pr\_proj} and
|
||||
\texttt{first\_pr} to represent whether the pull request is the first one submitted by a developer to a specific project and GitHub, respectively.
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Standing.}
|
||||
A dichotomous metric \texttt{core\_team},
|
||||
which indicates whether the pull request submitter is the core team member of the project,
|
||||
was commonly used as a signal of the developer's standing within the project~\cite{Tsay2014Influence,yu16det}.
|
||||
Furthermore,
|
||||
a continuous metric \texttt{followers},
|
||||
\ie the number of GitHub users that are following the pull request submitter,
|
||||
was used to represent the developers' standing within the community~\cite{Tsay2014Influence,Gousios:2014,yu16det}.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Previous interaction.}
|
||||
This metric (\texttt{prev\_interaction}) is the total number of events,
|
||||
\eg such as commenting on issues and pull requests,
|
||||
prior to the pull request submission
|
||||
that the developer has participated in within the project~\cite{Tsay2014Influence,yu16det}.
|
||||
|
||||
|
||||
|
||||
\vspace{0.5em}
|
||||
\noindent\textbf{Project-level characteristics.}
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Maturity.}
|
||||
Previous studies used the metric \texttt{proj\_age},
|
||||
\ie the period of time from the time the project was hosted on GitHub to the pull request submission time,
|
||||
as an indicator of the project maturity~\cite{Tsay2014Influence,yu16det,Rahman2014An}.
|
||||
However,
|
||||
a project does not necessarily use the pull request model in the first place.
|
||||
We also use the metric \texttt{prmodel\_age} to indicate
|
||||
how long a project has adopted the pull request development model.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Workload.}
|
||||
The discussion of issues and pull requests might cost days to months to come to an end.
|
||||
At any given time,
|
||||
a bunch of open issues and pull requests
|
||||
might be discussed simultaneously.
|
||||
Prior studies have characterized project integrators' workload using two metrics:
|
||||
\texttt{open\_tasks}~\cite{yu16det} and \texttt{team\_size}~\cite{Tsay2014Influence,Gousios:2014,yu16det},
|
||||
which are the number of open issues and open pull requests at the pull request submission and
|
||||
the number of active core team members during the last three months, respectively.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Popularity.}
|
||||
In measuring project popularity,
|
||||
the metric \texttt{stars},
|
||||
\ie the number of stars the project has got,
|
||||
was commonly used in prior studies~\cite{Bor17Und,Tsay2014Influence}.
|
||||
In addition,
|
||||
we also considered three other popularity-related metrics:
|
||||
\texttt{forks}, \texttt{prs}, and \texttt{contributors},
|
||||
which are the number of forks, the number of pull requests, and the number of contributors
|
||||
of the project, respectively.
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
\textit{Activeness.}
|
||||
We further measure the activeness of the project
|
||||
by including four metrics:
|
||||
\texttt{stars\_3M}, \texttt{forks\_3M}, \texttt{prs\_3M}, and \texttt{contributors\_3M},
|
||||
which are the new stars, new forks, new pull requests, and active contributors in the project in the last three months, respectively.
|
||||
|
||||
|
||||
\subsubsection{Comparative exploration}
|
||||
|
@ -732,8 +738,8 @@ $H_{0}$: duplicate pull requests exhibit a value of metric $m$ equal to that one
|
|||
\texttt{proj\_age}, \texttt{prmodel\_age},
|
||||
\hl{\texttt{open\_tasks}}
|
||||
\texttt{open\_issues}, \texttt{open\_prs}, \texttt{team\_size},
|
||||
\texttt{forks}, \texttt{stars}, \texttt{prs}, \texttt{contributors},
|
||||
\texttt{forks\_3M}, \texttt{stars\_3M}, \texttt{prs\_3M}, \texttt{contributors\_3M}\}
|
||||
\texttt{forks}, \texttt{stars}, \texttt{prs}, \texttt{contributors}
|
||||
\}
|
||||
|
||||
|
||||
\vspace{0.2em}
|
||||
|
@ -761,6 +767,56 @@ are different in terms of all metrics except for that they have similar number o
|
|||
\toprule
|
||||
\multicolumn{2}{r}{\textbf{Metric}} &\tabincell{c}{\textbf{Effect}\\\textbf{size}} &\tabincell{c}{\textbf{\textit{Adjusted}}\\\textbf{\textit{p-value}}}\\
|
||||
|
||||
|
||||
\midrule
|
||||
\multicolumn{4}{@{}l}{\textbf{Project-level characteristics}}\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{2}{*}{Maturity}
|
||||
& \texttt{proj\_age}&0.166 & 2.95e-21 ***\\
|
||||
& \texttt{prmodel\_age}& -0.108 &2.34e-11 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{3}{*}{Wordload}
|
||||
& \texttt{open\_tasks}&0.177 &2.00e-37 ***\\
|
||||
& \texttt{team\_size}& 0.257& 1.98e-41 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Popularity}
|
||||
& \texttt{forks}& -0.327& 3.06e-61 ***\\
|
||||
& \texttt{watchers}& -0.289& 9.77e-47 ***\\
|
||||
& \texttt{prs}& 0.187& 5.34e-19 ***\\
|
||||
& \texttt{contributors}& -0.238& 3.71e-49 ***\\
|
||||
|
||||
% \cdashline{1-4}[0.8pt/2pt]
|
||||
% \multirow{4}{*}{Activeness}
|
||||
% & \texttt{forks\_3M}& -0.315& 6.39e-51 ***\\
|
||||
% & \texttt{watchers\_3M}& -0.178& 2.07e-12 ***\\
|
||||
% & \texttt{prs\_3M}& 0.280& 7.11e-85 ***\\
|
||||
% & \texttt{contributors\_3M}& -0.115 & 2.31e-07 ***\\
|
||||
|
||||
\midrule
|
||||
\multicolumn{4}{@{}l}{\textbf{Submitter-level characteristics}}\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Experience}
|
||||
& \texttt{prev\_prs\_porj}&0.281 & 5.46e-192 ***\\
|
||||
& \texttt{prev\_prs}& 0.227& 1.82e-94 ***\\
|
||||
& \texttt{first\_pr\_proj}& -0.435 & 3.21e-149 ***\\
|
||||
& \texttt{first\_pr}& -0.200 & 3.91e-33 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{2}{*}{Standing}
|
||||
& \texttt{core\_team}& 0.385&2.40e-117 ***\\
|
||||
& \texttt{followers}&-0.002 & 8.78e-21 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{1}{*}{Interaction}
|
||||
% & \texttt{followings}& -0.012 &3.21e-06 ***\\
|
||||
% & \texttt{watch\_proj}&0.124 &1.28e-13 ***\\
|
||||
& \texttt{prev\_interaction}&0.163 & 9.81e-108 ***\\
|
||||
|
||||
|
||||
\midrule
|
||||
\multicolumn{4}{@{}l}{\textbf{Patch-level characteristics}}\\
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
|
@ -781,54 +837,7 @@ are different in terms of all metrics except for that they have similar number o
|
|||
&\texttt{change\_type}& 0.055& 9.73e-4 ***\\
|
||||
&\texttt{activity\_type}& 0.055& 9.73e-4 ***\\
|
||||
|
||||
\midrule
|
||||
\multicolumn{4}{@{}l}{\textbf{Submitter-level characteristics}}\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Experience}
|
||||
& \texttt{prev\_prs\_porj}&0.281 & 5.46e-192 ***\\
|
||||
& \texttt{prev\_prs}& 0.227& 1.82e-94 ***\\
|
||||
& \texttt{first\_pr\_proj}& -0.435 & 3.21e-149 ***\\
|
||||
& \texttt{first\_pr}& -0.200 & 3.91e-33 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{2}{*}{Standing}
|
||||
& \texttt{core\_team}& 0.385&2.40e-117 ***\\
|
||||
& \texttt{followers}&-0.002 & 8.78e-21 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{3}{*}{Connection}
|
||||
& \texttt{followings}& -0.012 &3.21e-06 ***\\
|
||||
& \texttt{watch\_proj}&0.124 &1.28e-13 ***\\
|
||||
& \texttt{prev\_interaction}&0.163 & 9.81e-108 ***\\
|
||||
|
||||
\midrule
|
||||
\multicolumn{4}{@{}l}{\textbf{Project-level characteristics}}\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{2}{*}{Maturity}
|
||||
& \texttt{proj\_age}&0.166 & 2.95e-21 ***\\
|
||||
& \texttt{prmodel\_age}& -0.108 &2.34e-11 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{3}{*}{Wordload}
|
||||
& \texttt{open\_issues}&0.177 &2.00e-37 ***\\
|
||||
& \texttt{open\_prs}& 0.011& 4.71e-32 ***\\
|
||||
& \texttt{team\_size}& 0.257& 1.98e-41 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Popularity}
|
||||
& \texttt{forks}& -0.327& 3.06e-61 ***\\
|
||||
& \texttt{watchers}& -0.289& 9.77e-47 ***\\
|
||||
& \texttt{prs}& 0.187& 5.34e-19 ***\\
|
||||
& \texttt{contributors}& -0.238& 3.71e-49 ***\\
|
||||
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Activeness}
|
||||
& \texttt{forks\_3M}& -0.315& 6.39e-51 ***\\
|
||||
& \texttt{watchers\_3M}& -0.178& 2.07e-12 ***\\
|
||||
& \texttt{prs\_3M}& 0.280& 7.11e-85 ***\\
|
||||
& \texttt{contributors\_3M}& -0.115 & 2.31e-07 ***\\
|
||||
|
||||
\bottomrule
|
||||
\end{tabularx}
|
||||
|
@ -924,85 +933,53 @@ For project-level metrics,
|
|||
\renewcommand{\arraystretch}{1.15}
|
||||
\centering
|
||||
\caption{\color{red}{Statistical models for the likelihood of duplicate pull requests}}
|
||||
\begin{tabularx}{\textwidth}{@{}l r Y Y Y Y Y Y Y Y Y@{}}
|
||||
\begin{tabularx}{\textwidth}{@{}r Y Y Y Y Y Y Y Y Y@{}}
|
||||
\toprule
|
||||
|
||||
|
||||
&& \multicolumn{3}{c}{\textbf{Model 1}}& \multicolumn{3}{c}{\textbf{Model 2}} & \multicolumn{3}{c}{\textbf{Model 3}}\\
|
||||
& & \multicolumn{3}{c}{response: \textit{is\_dup} = 1}& \multicolumn{3}{c}{response: \textit{is\_dup} = 1} & \multicolumn{3}{c}{response: \textit{is\_dup} = 1}\\
|
||||
\cmidrule(r){3-5} \cmidrule(r){6-8} \cmidrule(r){9-11}
|
||||
&& Coeffs. & Errors & Signif. & Coeffs. & Errors & Signif.& Coeffs. & Errors & Signif.\\
|
||||
& \multicolumn{3}{c}{\textbf{Model 1}}& \multicolumn{3}{c}{\textbf{Model 2}} & \multicolumn{3}{c}{\textbf{Model 3}}\\
|
||||
& \multicolumn{3}{c}{response: \textit{is\_dup} = 1}& \multicolumn{3}{c}{response: \textit{is\_dup} = 1} & \multicolumn{3}{c}{response: \textit{is\_dup} = 1}\\
|
||||
\cmidrule(r){2-4} \cmidrule(r){5-7} \cmidrule(r){8-10}
|
||||
& Coeffs. & Errors & Signif. & Coeffs. & Errors & Signif.& Coeffs. & Errors & Signif.\\
|
||||
|
||||
|
||||
\midrule
|
||||
% \multicolumn{4}{@{}l}{\textbf{Project characteristics}}\\
|
||||
\texttt{open\_tasks}& & & & & & & & &\\
|
||||
\texttt{team\_size}& & & & & & & & &\\
|
||||
\texttt{watchers}& & & & & & & & &\\
|
||||
|
||||
\midrule
|
||||
% \multicolumn{4}{@{}l}{\textbf{Patch characteristics}}\\
|
||||
\multirow{3}{*}{Size}
|
||||
&\texttt{commits} & & & & & & & & &\\
|
||||
& \texttt{files} & & & & & & & & & \\
|
||||
& \texttt{churn}& & & & & & & & &\\
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
|
||||
\multirow{2}{*}{Text}
|
||||
&\texttt{title\_len}& & & & & & & & &\\
|
||||
& \texttt{desc\_len}& & & & & & & & &\\
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
Hotness &\texttt{hotness}& & & & & & & & &\\
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
Reference& \texttt{issue\_tag}& & & & & & & & &\\
|
||||
\cdashline{1-4}[0.8pt/2pt]
|
||||
\multirow{2}{*}{Type}
|
||||
&\texttt{change\_type}& & & & & & & & &\\
|
||||
&\texttt{activity\_type}& & & & & & & & &\\
|
||||
|
||||
\midrule
|
||||
% \multicolumn{4}{@{}l}{\textbf{Submitter characteristics}}\\
|
||||
% \cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Experience}
|
||||
& \texttt{prev\_prs\_porj}& & & & & & & & &\\
|
||||
& \texttt{prev\_prs}& & & & & & & & &\\
|
||||
& \texttt{first\_pr\_proj}& & & & & & & & &\\
|
||||
& \texttt{first\_pr}& & & & & & & & &\\
|
||||
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{3}{*}{Status}
|
||||
& \texttt{core\_team}& & & & & & & & &\\
|
||||
& \texttt{followers}& & & & & & & & &\\
|
||||
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{3}{*}{Connection}
|
||||
& \texttt{followings}& & & & & & & & &\\
|
||||
& \texttt{watch\_proj}& & & & & & & & &\\
|
||||
& \texttt{prev\_interaction}& & & & & & & & &\\
|
||||
\texttt{prev\_prs}& & & & & & & & &\\
|
||||
\texttt{prev\_prs\_acc}& & & & & & & & &\\
|
||||
\texttt{first\_pr\_proj}& & & & & & & & &\\
|
||||
\texttt{core\_team}& & & & & & & & &\\
|
||||
\texttt{followers}& & & & & & & & &\\
|
||||
\texttt{prev\_interaction}& & & & & & & & &\\
|
||||
|
||||
\midrule
|
||||
% \multicolumn{4}{@{}l}{\textbf{Project characteristics}}\\
|
||||
% \cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{2}{*}{Maturity}
|
||||
& \texttt{proj\_age}& & & & & & & & &\\
|
||||
& \texttt{prmodel\_age}& & & & & & & & &\\
|
||||
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{3}{*}{Wordload}
|
||||
& \texttt{open\_issues}& & & & & & & & &\\
|
||||
& \texttt{open\_prs}& & & & & & & & &\\
|
||||
& \texttt{team\_size}& & & & & & & & &\\
|
||||
|
||||
\midrule
|
||||
% \multicolumn{4}{@{}l}{\textbf{Patch characteristics}}\\
|
||||
\texttt{commits} & & & & & & & & &\\
|
||||
\texttt{files} & & & & & & & & & \\
|
||||
\texttt{churn}& & & & & & & & &\\
|
||||
|
||||
\texttt{title\_len}& & & & & & & & &\\
|
||||
\texttt{desc\_len}& & & & & & & & &\\
|
||||
\texttt{hotness}& & & & & & & & &\\
|
||||
\texttt{issue\_tag}& & & & & & & & &\\
|
||||
\texttt{change\_type}& & & & & & & & &\\
|
||||
\texttt{activity\_type}& & & & & & & & &\\
|
||||
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Popularity}
|
||||
& \texttt{forks}& & & & & & & & &\\
|
||||
& \texttt{watchers}& & & & & & & & &\\
|
||||
& \texttt{prs}& & & & & & & & &\\
|
||||
& \texttt{contributors}& & & & & & & & &\\
|
||||
|
||||
\cdashline{1-11}[0.8pt/2pt]
|
||||
\multirow{4}{*}{Activeness}
|
||||
& \texttt{forks\_3M}& & & & & & & & &\\
|
||||
& \texttt{watchers\_3M}& & & & & & & & &\\
|
||||
& \texttt{prs\_3M}& & & & & & & & &\\
|
||||
& \texttt{contributors\_3M}& & & & & & & & &\\
|
||||
|
||||
\midrule
|
||||
|
||||
\multicolumn{2}{r}{Area Under the ROC Curve:} &\multicolumn{3}{c}{0.661}& \multicolumn{3}{c}{0.845} & \multicolumn{3}{c}{0.871}\\
|
||||
\multicolumn{1}{r}{Area Under the ROC Curve:} &\multicolumn{3}{c}{0.661}& \multicolumn{3}{c}{0.845} & \multicolumn{3}{c}{0.871}\\
|
||||
|
||||
\bottomrule
|
||||
\end{tabularx}
|
||||
|
|
|
@ -78,7 +78,7 @@ if they can fulfil developers' actual demands in maintaining awareness.
|
|||
|
||||
|
||||
\vspace{0.5em}
|
||||
\noindent \textbf{A mismatch between awareness information and actual outcomes/***.}
|
||||
\noindent \textbf{A mismatch between awareness *** actual information exchange.}
|
||||
Maintaining awareness is dual.
|
||||
Intuitively,
|
||||
it means developers need to \textit{gather external information} to stay aware of others' activities.
|
||||
|
@ -90,7 +90,9 @@ This hinders other developers' ability to gather adequate contextual information
|
|||
Although prior work~\cite{Treude2010Awareness,Arora2016Supporting,Calefato2012Social} has extensively studied on
|
||||
how to help developers track work and get information,
|
||||
more research attention should be paid to encouraging developers to share awareness information.
|
||||
% 不share是不是因为性格?是不是因为working style
|
||||
For example,
|
||||
it would be interesting to investigate
|
||||
where developers' willingness to share information is affected by the characteristics of collaboration mechanisms and communication tools.
|
||||
|
||||
|
||||
Obviously,
|
||||
|
|
|
@ -0,0 +1,127 @@
|
|||
setwd("~/Desktop")
|
||||
|
||||
#########################################################
|
||||
# load data
|
||||
rq2_metrics <- read.csv("rq2_metrics.csv")
|
||||
summary(rq2_metrics)
|
||||
# remove NA data
|
||||
rq2_data = rq2_metrics
|
||||
rq2_data[is.na(rq2_data$prev_pr_acc_proj),]$prev_pr_acc_proj = 0
|
||||
rq2_data[is.na(rq2_data$prev_prs_acc),]$prev_prs_acc = 0
|
||||
rq2_data[is.na(rq2_data$open_issues),]$open_issues = 0
|
||||
|
||||
summary(rq2_data)
|
||||
|
||||
# prev_prs_acc_projΪnullֱ??ȥ???Dz??Dz?̫??
|
||||
rq2_m <- rq2_data[complete.cases(rq2_data),]
|
||||
summary(rq2_m)
|
||||
#########################################################
|
||||
|
||||
|
||||
#########################################################
|
||||
# categorical metrics
|
||||
rq2_m$is_dup = as.logical(rq2_m$is_dup)
|
||||
rq2_m$file_type = as.factor(rq2_m$file_type)
|
||||
rq2_m$first_pr_proj = as.logical(rq2_m$first_pr_proj)
|
||||
rq2_m$first_pr = as.logical(rq2_m$first_pr)
|
||||
rq2_m$core_team = as.logical(rq2_m$core_team)
|
||||
rq2_m$watch_proj = as.logical(rq2_m$watch_proj)
|
||||
rq2_m$issue_tag = as.logical(rq2_m$issue_tag)
|
||||
rq2_m$prj_id = as.factor(rq2_m$prj_id)
|
||||
rq2_m$change_type = as.factor(rq2_m$change_type)
|
||||
|
||||
#########################################################
|
||||
summary(rq2_m)
|
||||
|
||||
rq2_m1 = subset(rq2_m, loc<100000 & desc_len<10000)
|
||||
|
||||
|
||||
library(pROC)
|
||||
library(car)
|
||||
library(lme4)
|
||||
|
||||
#setp models----------------
|
||||
#proj level---------------
|
||||
gbg_1 = glmer(formula = is_dup ~
|
||||
|
||||
log(open_tasks+0.5)
|
||||
+ log(team_size+0.5)
|
||||
+ log(watchers+0.5)
|
||||
+ log(hotness+0.5)
|
||||
|
||||
+ (1|prj_id),
|
||||
data=rq2_m1,
|
||||
family="binomial"
|
||||
)
|
||||
|
||||
vif(gbg_1)
|
||||
summary(gbg_1)
|
||||
prob=predict(gbg_1, type=c("response"))
|
||||
rq2_m1$prob=prob
|
||||
a = roc(is_dup ~ prob, data = rq2_m1)
|
||||
a
|
||||
|
||||
#proj level and submitter level---------------
|
||||
gbg_2 = glmer(formula = is_dup ~
|
||||
|
||||
log(open_tasks+0.5)
|
||||
+ log(team_size+0.5)
|
||||
+ log(watchers+0.5)
|
||||
+ log(hotness+0.5)
|
||||
|
||||
+ log(prev_pullreqs+0.5)
|
||||
+ log(prev_prs_acc + 0.5)
|
||||
+ first_pr_proj
|
||||
+ log(followers+0.5)
|
||||
+ core_team
|
||||
+ log(prior_interaction+0.5)
|
||||
|
||||
|
||||
|
||||
+ (1|prj_id),
|
||||
data=rq2_m1,
|
||||
family="binomial"
|
||||
)
|
||||
vif(gbg_2)
|
||||
summary(gbg_2)
|
||||
prob=predict(gbg_2, type=c("response"))
|
||||
rq2_m1$prob=prob
|
||||
a = roc(is_dup ~ prob, data = rq2_m1)
|
||||
a
|
||||
|
||||
|
||||
#proj level, submitter level and PR level---------------
|
||||
gbg_3 = glmer(formula = is_dup ~
|
||||
log(open_tasks+0.5)
|
||||
+ log(team_size+0.5)
|
||||
+ log(watchers+0.5)
|
||||
+ log(hotness+0.5)
|
||||
|
||||
+ log(prev_pullreqs+0.5)
|
||||
+ log(prev_prs_acc + 0.5)
|
||||
+ first_pr_proj
|
||||
+ log(followers+0.5)
|
||||
+ core_team
|
||||
+ log(prior_interaction+0.5)
|
||||
|
||||
|
||||
+ log(commits+0.5) + log(files_changed+0.5) + log(loc+0.5)
|
||||
+ log(title_len+0.5)
|
||||
+ log(desc_len + 0.5)
|
||||
+ log(title_len+desc_len)
|
||||
+ issue_tag
|
||||
+ change_type
|
||||
+ file_type
|
||||
|
||||
+ (1|prj_id),
|
||||
data=rq2_m1,
|
||||
family="binomial"
|
||||
)
|
||||
vif(gbg_3)
|
||||
summary(gbg_3)
|
||||
prob=predict(gbg_3, type=c("response"))
|
||||
rq2_m1$prob=prob
|
||||
a = roc(is_dup ~ prob, data = rq2_m1)
|
||||
a
|
||||
|
||||
|
Binary file not shown.
|
@ -0,0 +1,127 @@
|
|||
setwd("~/Desktop")
|
||||
|
||||
#########################################################
|
||||
# load data
|
||||
rq2_metrics <- read.csv("rq2_metrics.csv")
|
||||
summary(rq2_metrics)
|
||||
# remove NA data
|
||||
rq2_data = rq2_metrics
|
||||
rq2_data[is.na(rq2_data$prev_pr_acc_proj),]$prev_pr_acc_proj = 0
|
||||
rq2_data[is.na(rq2_data$prev_prs_acc),]$prev_prs_acc = 0
|
||||
rq2_data[is.na(rq2_data$open_issues),]$open_issues = 0
|
||||
|
||||
summary(rq2_data)
|
||||
|
||||
# prev_prs_acc_proj为null直接去掉是不是不太好
|
||||
rq2_m <- rq2_data[complete.cases(rq2_data),]
|
||||
summary(rq2_m)
|
||||
#########################################################
|
||||
|
||||
|
||||
#########################################################
|
||||
# categorical metrics
|
||||
rq2_m$is_dup = as.logical(rq2_m$is_dup)
|
||||
#rq2_m$pr_type = as.factor(rq2_m$pr_type)
|
||||
rq2_m$file_type = as.factor(rq2_m$file_type)
|
||||
rq2_m$first_pr_proj = as.logical(rq2_m$first_pr_proj)
|
||||
rq2_m$first_pr = as.logical(rq2_m$first_pr)
|
||||
rq2_m$core_team = as.logical(rq2_m$core_team)
|
||||
rq2_m$watch_proj = as.logical(rq2_m$watch_proj)
|
||||
rq2_m$issue_tag = as.logical(rq2_m$issue_tag)
|
||||
rq2_m$prj_id = as.factor(rq2_m$prj_id)
|
||||
rq2_m$change_type = as.factor(rq2_m$change_type)
|
||||
|
||||
#########################################################
|
||||
summary(rq2_m)
|
||||
|
||||
rq2_m1 = subset(rq2_m, loc<100000 & desc_len<10000)
|
||||
|
||||
|
||||
library(pROC)
|
||||
library(car)
|
||||
library(lme4)
|
||||
|
||||
#setp models----------------
|
||||
#proj level---------------
|
||||
gbg_1 = glmer(formula = is_dup ~
|
||||
log(proj_age+0.5)
|
||||
+ log(open_tasks+0.5)
|
||||
+ log(team_size+0.5)
|
||||
+ log(watchers+0.5)
|
||||
|
||||
+ (1|prj_id),
|
||||
data=rq2_m1,
|
||||
family="binomial"
|
||||
)
|
||||
|
||||
vif(gbg_1)
|
||||
summary(gbg_1)
|
||||
prob=predict(gbg_1, type=c("response"))
|
||||
rq2_m1$prob=prob
|
||||
a = roc(is_dup ~ prob, data = rq2_m1)
|
||||
a
|
||||
|
||||
#proj level and submitter level---------------
|
||||
gbg_2 = glmer(formula = is_dup ~
|
||||
log(proj_age+0.5)
|
||||
+ log(open_tasks+0.5)
|
||||
+ log(team_size+0.5)
|
||||
+ log(watchers+0.5)
|
||||
#+ log(forks_3M+0.5)
|
||||
#+ log(watchers_3M+0.5)
|
||||
#+ log(pullreqs_3M+0.5)
|
||||
|
||||
+ log(prev_pullreqs+0.5)
|
||||
+ first_pr_proj
|
||||
#+ log(prev_pullreqs_proj + 0.5)
|
||||
+ log(prev_pr_acc_proj + 0.5)
|
||||
+ log(followers+0.5)
|
||||
+ core_team
|
||||
+ log(prior_interaction+0.5)
|
||||
|
||||
|
||||
+ (1|prj_id),
|
||||
data=rq2_m1,
|
||||
family="binomial"
|
||||
)
|
||||
vif(gbg_2)
|
||||
summary(gbg_2)
|
||||
prob=predict(gbg_2, type=c("response"))
|
||||
rq2_m1$prob=prob
|
||||
a = roc(is_dup ~ prob, data = rq2_m1)
|
||||
a
|
||||
|
||||
|
||||
#proj level, submitter level and PR level---------------
|
||||
gbg_3 = glmer(formula = is_dup ~
|
||||
log(proj_age+0.5)
|
||||
+ log(open_tasks+0.5)
|
||||
+ log(team_size+0.5)
|
||||
+ log(watchers+0.5)
|
||||
|
||||
+ log(prev_pullreqs+0.5)
|
||||
+ log(prev_prs_acc + 0.5)
|
||||
+ first_pr_proj
|
||||
+ log(followers+0.5)
|
||||
+ core_team
|
||||
+ log(prior_interaction+0.5)
|
||||
|
||||
+ log(hotness+0.5)
|
||||
+ log(commits+0.5) + log(files_changed+0.5) + log(loc+0.5)
|
||||
+ log(title_len+desc_len)
|
||||
+ issue_tag
|
||||
+ change_type
|
||||
+ file_type
|
||||
|
||||
+ (1|prj_id),
|
||||
data=rq2_m1,
|
||||
family="binomial"
|
||||
)
|
||||
vif(gbg_3)
|
||||
summary(gbg_3)
|
||||
prob=predict(gbg_3, type=c("response"))
|
||||
rq2_m1$prob=prob
|
||||
a = roc(is_dup ~ prob, data = rq2_m1)
|
||||
a
|
||||
|
||||
|
Loading…
Reference in New Issue