553 lines
26 KiB
TeX
553 lines
26 KiB
TeX
|
|
\section{Discussion}
|
|
\label{sec:ds}
|
|
Based on our analysis results and findings,
|
|
we now discuss recommendations and implications
|
|
for OSS practitioners.
|
|
|
|
{\color{hltext}
|
|
|
|
\subsection{Main findings}
|
|
% coordination
|
|
% drive-by pr (not have the patience to check)
|
|
|
|
|
|
|
|
\subsubsection{Awareness breakdown}
|
|
|
|
Maintaining awareness in global distributed development is a significant concern~\cite{Treude2010Awareness,Gutwin2004Group}.
|
|
Developers need to pursue \textit{``an understanding of the activities of others, which provides a context for your own activity''}~\cite{dourish1992awareness}.
|
|
As the most popular collaborative development platform,
|
|
GitHub has centralized information about project statuses and developer activities
|
|
and made them transparent and visible~\cite{Dabbish2012Social},
|
|
which help developers to maintain awareness with less effort~\cite{Kalliamvakou2015Open}.
|
|
However,
|
|
awareness breakdown still occurs and results in duplicate work.
|
|
Our analysis of the specific context where duplicates are produced,
|
|
as shown in Section~\ref{ss:contextanalysis},
|
|
reveal
|
|
three mismatches leading to awareness breakdown.
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{A mismatch between awareness requirements and actual activities.}
|
|
In most community-based OSS projects,
|
|
developers are self-organized~\cite{Crowston2005Coordination,bird2008latent},
|
|
and are allowed to work on any part ot the code according to individual's interest and time\cite{Gutwin2004Group, lakhani2003hackers}.
|
|
Awareness requirements arise
|
|
whenever a developer work on an issue
|
|
because other developers might self-assign themselves to the same issue.
|
|
However,
|
|
our findings show that
|
|
some contributors lack sufficient effort investment in awareness activities
|
|
({\footnotesize Section~\ref{ss:contextanalysis}: \textit{\textbf{Not searching for existing work}}, \textit{\textbf{Overlooking linked pull requests}}, and \textit{\textbf{Overlooking existing claim}}}).
|
|
We assume that this is due to the volunteer nature of OSS participation.
|
|
For some developers,
|
|
especially the casual contributors and one time contributors,
|
|
a major motivation to make contributions is to \textit{``scratch their own itch''}~\cite{pinto2016more,lee2017understanding}.
|
|
When they encounter a problem,
|
|
they code a patch to fix it and send the patch back to the community.
|
|
Some of them even do not care about the final outcome of their pull requests~\cite{steinmacher2018almost}.
|
|
It might be harder to get them spend more time to maintain awareness
|
|
of other developers.
|
|
Automatic awareness tools can mitigate this problem.
|
|
Prior research has proposed to automatically detect duplicates at pull request submission~\cite{li2017detecting,ren2019identifying} and identify ongoing features from forks~\cite{zhou2018identifying}.
|
|
Furthermore,
|
|
we advocate for future research on seamlessly integrating awareness tools to developers' development environment
|
|
and designing intelligent and non-intrusive notification mechanism.
|
|
|
|
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{A mismatch between awareness mechanisms and actual demands.}
|
|
% passive and active
|
|
Currently,
|
|
GitHub provides developers with a wide range of mechanisms,
|
|
\eg following developers and watching projects~\cite{Dabbish2012Social},
|
|
to maintain a general awareness about project status.
|
|
% in terms of historical, current, and even upcoming events.
|
|
However,
|
|
developers can be overwhelmed with a large-scale of incoming events in popular projects
|
|
({\footnotesize Section~\ref{ss:contextanalysis}: \textit{\textbf{Missing notifications}}}).
|
|
It is also impractical for developers to always maintain overall awareness of a project
|
|
due to multitasking~\cite{vasilescu2016sky} and turnover~\cite{Lin2017Developer}.
|
|
Usually,
|
|
developers need to obtain on-demand awareness around a specific task
|
|
whenever deciding to submit a pull request,
|
|
\ie gathering task-centric information to figure out people interested in the same task.
|
|
Currently,
|
|
the main mechanisms to meet this demand
|
|
are querying issue and pull request list and reading through the discussion history.
|
|
As mentioned in Section~\ref{ss:contextanalysis},
|
|
the support by these mechanisms is not as adequate as expected
|
|
due to information mixture
|
|
({\footnotesize Section~\ref{ss:contextanalysis}: \textbf{\textit{Overlooking linked pull requests}} and \textbf{\textit{Overlooking existing claims}}})
|
|
and other technical problems
|
|
({\footnotesize Section~\ref{ss:contextanalysis}:
|
|
\textbf{\textit{Disappointing search functionality}} and \textbf{\textit{Diversity of natural language usages}}}).
|
|
Awareness mechanisms would be most useful
|
|
if they can fulfil developers' actual demands in maintaining awareness.
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{A mismatch between awareness maintenance and actual information exchange.}
|
|
Maintaining awareness is
|
|
% dual.
|
|
bidirectional.
|
|
Intuitively,
|
|
it means that developers need to \textit{gather external information} to stay aware of others' activities.
|
|
But from a global perspective,
|
|
it also means developers should actively \textit{share their personal information} that can be gathered by others.
|
|
Our findings show that some developers do not timely announce their plans
|
|
({\footnotesize Section~\ref{ss:contextanalysis}: \textbf{\textit{Implementing without claiming first}}})
|
|
and link their work to the associated issue
|
|
({\footnotesize Section~\ref{ss:contextanalysis}: \textbf{\textit{Lack of links}}}).
|
|
This hinders other developers' ability to gather adequate contextual information.
|
|
Although prior work~\cite{Treude2010Awareness,Arora2016Supporting,Calefato2012Social} has extensively studied on
|
|
how to help developers track work and get information,
|
|
more research attention should be paid to encouraging developers to share awareness information.
|
|
For example,
|
|
it would be interesting to investigate
|
|
where developers' willingness to share information is affected by the characteristics of collaboration mechanisms and communication tools.
|
|
|
|
|
|
Obviously,
|
|
awareness tools are important for OSS developers to stay aware of each other.
|
|
However,
|
|
no tool or mechanism can prevent all awareness breakdown entirely.
|
|
A better understanding of the importance of group awareness
|
|
and better use of available technologies can
|
|
help developers ensure that their individual contributions do not cause accidental duplicates.
|
|
|
|
|
|
|
|
\subsubsection{The paradox of duplicates}
|
|
Generally speaking,
|
|
both project integrators and contributors hope to prevent duplicate pull requests,
|
|
because duplicates can waste their time and effort,
|
|
as shown in Section~\ref{rq1}.
|
|
This is also reflected in their comments,
|
|
\eg
|
|
\textit{``it probably makes sens to just center around a single effort''},
|
|
\textit{``No need to do the same thing in two PRs''},
|
|
and \textit{``Oops! Sorry, did not mean to double up''}.
|
|
However,
|
|
when duplicates are already produced,
|
|
some potential value might be mined from them
|
|
% except for the negative impacts,
|
|
as shown in Section~\ref{beyond}.
|
|
|
|
% 在管理者看来
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Redundancy vs. alternative.}
|
|
In many cases,
|
|
duplicate pull requests change pretty much the same codes,
|
|
which only bring unnecessary redundance.
|
|
While in some cases,
|
|
duplicates implemented in different approaches
|
|
provide alternative solutions,
|
|
as a developer put it:
|
|
\textit{``The pull requests are different, so maybe it is good there are two.''}
|
|
In such cases,
|
|
project integrators have a higher chance to accept a better patch.
|
|
However,
|
|
this comes at a price.
|
|
Integrators have to invest more time to compare the details of duplicates
|
|
in order to clearly disclose the difference between them.
|
|
Ensuring that their effort is not wasted in coping with duplicates,
|
|
but maximizing the disclosure and adoption of additional value provided by each duplicate,
|
|
is a trade-off integrators should be aware of.
|
|
|
|
|
|
|
|
% 对贡献者来说
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Competition vs. collaboration.}
|
|
At first sight,
|
|
authors of duplicate pull requests
|
|
face a competition in getting their own patches accepted.
|
|
For example,
|
|
one contributor tried to persuade integrators to choose his pull request rather than another one:
|
|
\textit{``Pick me! Pick me! I was first! :)''}.
|
|
Nevertheless,
|
|
we found some cases where
|
|
the authors of duplicates worked together towards a better patch by means of mutual assessment and
|
|
negotiated incorporation,
|
|
as shown in Section~\ref{beyond}.
|
|
According to developers,
|
|
the collaboration is also an opportunity for both the authors
|
|
to learn from each other's strength
|
|
(\textit{`` I looked at some awesome code that xxx wrote to fix this issue and it was so simple, I just did not fully understand the issue I was fixing.''}).
|
|
% and fosters a friendly collaborative environment.
|
|
Standing for their own patches,
|
|
but seeking for collaboration and learning,
|
|
is a trade-off the authors of duplicates should be aware of.
|
|
|
|
|
|
|
|
\subsubsection{Decision-making in Either/Or situations}
|
|
|
|
|
|
% Previous studies on pull request acceptance suggest that
|
|
% there are many more factors that influence contribution evaluation.
|
|
|
|
In the general cases of pull request evaluation,
|
|
integrators are answering a \textit{Yes/No} question \textit{``whether to accept this pull request?''},
|
|
and the considered factors are mainly relating to individual pull requests.
|
|
While in the cases of making a decision between duplicates,
|
|
integrators are answering an \textit{Either/Or} question \textit{``whether to choose this duplicate pull request or the other one?''}.
|
|
Our findings about integrators' choice between duplicates,
|
|
as presented in Section~\ref{rq3},
|
|
show that in context of the selection between duplicates,
|
|
additional factors are considered by integrators,
|
|
and the effects of some factors are opposite to
|
|
what were observed in the context of general pull request evaluation.
|
|
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Community fairness matters:}
|
|
Our findings confirm the results of previous studies that
|
|
pull requests with accurate and high-quality implementation and submitted by experienced developers
|
|
are more likely to be accepted~\cite{yu16det,gousios2015work,zou2019how,Kononenko2018Studying}.
|
|
However,
|
|
the submitter's standing and the social connection between the submitter with integrators,
|
|
do not have as strong effects on the acceptance of duplicate pull request as observed in prior work~\cite{Tsay2014Influence,yu16det}.
|
|
Instead,
|
|
we observe that a more objective factor, \ie the arrival order of duplicates,
|
|
presents a strong effect.
|
|
Both our regression model
|
|
% (predictor \texttt{early\_arrival})
|
|
and manual observation
|
|
% (reason \textbf{First-come, First-served})
|
|
show that
|
|
duplicate pull requests arriving earlier than their counterparts are more likely to be accepted.
|
|
% This factor belongs to neither social metric nor technical metric.
|
|
Respecting the arrival order is like a default community rule
|
|
developers usually follow to manage duplicates~\cite{sun2011towards,sun2010discriminative,stackexchange-dup}.
|
|
% A similar treatment \textit{``first-in, first-out''} was also observed
|
|
% when integrators prioritize their work on pull request evaluation~\cite{gousios2015work}.
|
|
It might reveal that integrators focus more on the technical metrics and objective difference
|
|
when making decisions between duplicates
|
|
in order to ensure fairness within the community~\cite{Jensen2007Role, baysal2012secret}.
|
|
|
|
% explaining rejection big challenge
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Effort investment matters:}
|
|
% revision; in_line comments
|
|
While prior work on pull request acceptance has suggested that highly discussed pull requests are less likely to be accepted~\cite{Tsay2014Influence}.
|
|
Our model shows that
|
|
duplicate pull requests with more inline comments
|
|
% (predictor \texttt{comments\_inline})
|
|
and revisions
|
|
% (predictor \texttt{revisions})
|
|
have higher chances of acceptance.
|
|
A higher number of review comments and revisions might indicate that
|
|
the pull request is not perfect and integrators request changes to improve it.
|
|
However,
|
|
it also reflects that the pull request has been thoroughly reviewed,
|
|
and both integrators and the submitter have invested considerable effort.
|
|
If the pull request does not have fatal drawbacks and the other duplicate pull request do not provide significant benefits,
|
|
integrators should prefer the thoroughly reviewed pull request rather than the shallowly reviewed one.
|
|
After all,
|
|
the top challenge faced by integrators is time~\cite{gousios2015work},
|
|
and it is reasonable for them to avoid putting more redundant effort on the shallowly reviewed duplicates.
|
|
Moreover,
|
|
asking for more work from contributors to improve their pull requests might be difficult~\cite{gousios2015work,steinmacher2018almost}.
|
|
Thus,
|
|
choosing the thoroughly discussed duplicate pull requests of high maturity
|
|
% (\ie higher \texttt{revisions})
|
|
might be a safe decision.
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
\subsection{Suggestions for contributors}
|
|
To avoid unintentional duplicate pull requests,
|
|
contributors may follow a set of best contributing practices
|
|
when they are involved in the pull-based development model.
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Adequate checking:}
|
|
{\color{hltext}
|
|
Many duplicates were produced because
|
|
contributors did not conduct adequate checking to
|
|
make sure that no one else is working on the same thing ({\footnotesize Section~\ref{ss:contextanalysis}:
|
|
\textbf{\textit{Not searching for existing work}}, \textbf{\textit{Overlooking linked pull requests}}, and \textbf{\textit{Overlooking existing claims}}}).
|
|
We recommend that
|
|
contributors should perform at least three kinds of checking:
|
|
\textit{i)} reading through the whole discussion of an issue and checking whether anyone has claimed the issue;
|
|
\textit{ii)} examining each of the pull requests linked to an issue and checking whether any of them is an ongoing work to solve the issue; and
|
|
\textit{iii)} performing searches with different keywords against open and closed pull requests and issues,
|
|
and carefully checking where similar work already exists.
|
|
}
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Timely completion:}
|
|
Quite a number of OSS developers contribute to a project at their spare time,
|
|
and some of them even switch between multiple tasks.
|
|
As a result,
|
|
it might be difficult for them to complete an individual task in a timely fashion.
|
|
However,
|
|
we still suggest that contributors should quickly accomplish each work in proper order, \eg one item at a time, to shorten their local duration.
|
|
This can make their work publicly visible earlier,
|
|
which can, to some extent, prevent others from submitting duplicates
|
|
({\color{hltext}
|
|
{\footnotesize Section~\ref{ss:contextanalysis}:
|
|
\textbf{\textit{Overlong local work}}
|
|
}}).
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Precise context:}
|
|
Providing complete and clear textual information for submitted pull requests is helpful
|
|
for other contributors to retrieve these pull requests and acquire an accurate and comprehensive understanding of them
|
|
({\color{hltext}
|
|
{\footnotesize Section~\ref{ss:contextanalysis}: \textbf{\textit{Diversity of natural language usage}}
|
|
}}).
|
|
In addition,
|
|
if a pull request is solving a tracked issue,
|
|
adding the issue reference in the pull request description,
|
|
\textit{e.g., ``fix \#[issue\_number]'',}
|
|
% helps to aggregate related activities concerning the same issue.
|
|
{\color{hltext}
|
|
can avoid some duplicates because of
|
|
the increasing degree of awareness.
|
|
%other developers would have a chance to find the linked pull request
|
|
( {\footnotesize Section~\ref{ss:contextanalysis}:
|
|
\textbf{\textit{Lack of links}}
|
|
}).
|
|
}
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Early declaration:}
|
|
{\color{hltext}
|
|
Zhou \etal~\cite{zhou2019fork} already suggested that
|
|
claiming an issue upfront is associated with a lower chance of redundant work.
|
|
We would like to emphasize again the importance of early declaration base our findings ({\color{hltext}
|
|
{\footnotesize Section~\ref{ss:contextanalysis}:
|
|
\textbf{\textit{Implementing without claiming first}}
|
|
}}).
|
|
If contributors decide to submit a pull request to an issue,
|
|
they are better to first declare their intentions before coding,
|
|
instead of reporting their patches after the work is done.
|
|
Compared with late report,
|
|
early declaration can timely
|
|
broadcast contributors' intention to the community to get the attention of interested parties,
|
|
so that they can avoid accidental duplicate work }
|
|
and the core team has a chance to coordinate contributors' work,
|
|
as an integrator stated
|
|
\textit{``@xxx, btw, it is a good idea to comment on an issue when you start working on it, so we can coordinate better and avoid duplication of effort''}.
|
|
|
|
|
|
{\color{hltext}
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Argue for their PRs:}
|
|
Integrators might consider various factors when making decisions between duplicates,
|
|
as shown in Section~\ref{rq3}.
|
|
Contributors should actively argue for their own duplicate pull requests
|
|
by explicitly stating the strength of their patches,
|
|
especially if they have proposed a different approach and provided additional benefits.
|
|
Authors of duplicates can review each of other's patch and discuss the difference between their patches
|
|
before waiting for an official statement from the core team.
|
|
This can provide a solid basis for integrators to make informed decisions about which duplicate should be accepted.
|
|
Moreover,
|
|
if the value of a duplicate pull request has been explicitly stated,
|
|
even it is finally closed,
|
|
its useful part has a higher chance to be noticed and cherry-picked by integrators,
|
|
as shown in Section~\ref{beyond}.
|
|
}
|
|
|
|
|
|
\subsection{Suggestions for core team}
|
|
The core team of an OSS project,
|
|
acting as the integrator and maintainer of the project,
|
|
is responsible for establishing contribution standards
|
|
and coordinating contributors' development.
|
|
To achieve the long-term and continuous survival of the project,
|
|
the core team may also follow some best practices.
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Evident guidelines:}
|
|
{\color{hltext}
|
|
OSS projects,
|
|
especially popular projects,
|
|
usually have a contribution guideline
|
|
telling contributors how to report an issue, submit a pull request and \etc
|
|
We found many of them have warned contributors not to submit duplicate issues and pull requests.
|
|
Since
|
|
many developers said they did not or forgot to check for existing work
|
|
({\footnotesize Section~\ref{ss:contextanalysis}: \textbf{\textit{Not searching for existing work}}, \textbf{\textit{Overlooking linked pull requests}}, and \textbf{\textit{Overlooking existing claims}}
|
|
}).
|
|
projects can make the warning more visible (\eg highlighted in bold) and
|
|
easy-follow (\eg clearly itemizing the behaviors should be taken to check or existing pull requests).
|
|
}
|
|
|
|
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Explaining decisions:}
|
|
Integrators must make a choice between duplicate pull requests,
|
|
which means that they have to reject someone.
|
|
For contributors whose pull requests have been rejected,
|
|
they might be pleased to get feedback and explanation about why their work has been rejected rather than simply closing their pull requests.
|
|
% After all,
|
|
% they have devoted time and energy to their contribution.
|
|
{\color{hltext}
|
|
However,
|
|
we observed nearly 50\% of our qualitative samples where decisions were made without any explanation (as shown in Table~\ref{tab:reason}).
|
|
Even worse,
|
|
we identified that the rough explanation (\eg \textit{``Thanks for your PR but this fix is already merged in \#20610''}) would be likely to make the contributor upset
|
|
(\textit{``'already' implies I submitted my PR later than that, rather than nearly a year earlier ;) But at least it's fixed.''}).
|
|
In that case, the integrator had to give an additional apology (\textit{``sorry, sometimes a PR falls in the cracks and a newer one gets the attention. We have improved the process in hopes to avoid this but we still have a big backlog in which these things are present.''}) to mitigate the negative effect.
|
|
In the future, a careful analysis should be designed to examine the effectiveness of this suggestion based on controlled experiments.
|
|
}
|
|
|
|
|
|
|
|
|
|
\subsection{Suggestions for design of platforms}
|
|
Online collaborating platforms such as GitHub have designed and provided numerous mechanisms and tools to support OSS development.
|
|
However,
|
|
the practical problem of duplicate contributions proves that the platforms need to be improved.
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Claim button:}
|
|
% As discussed in Section~\ref{ss:contextanalysis},
|
|
% in the pull-based development model,
|
|
% contributors first finish local work and then submit a pull request for online evaluation,
|
|
% which can be described as the \textit{behave-then-report} workflow.
|
|
% This workflow makes other contributors aware of a contributor's activities mostly after her/his work is done.
|
|
% Perhaps platforms can come up with a new mechanism
|
|
% to achieve early disclosure of contributors' intended work.
|
|
% In our opinion,
|
|
% the \textit{declare-then-behave} workflow is probably a workable solution;
|
|
% it requires contributors to first declare their intention and then conduct concrete work before finally submitting a pull request.
|
|
% This is similar to the \textit{pullrequest-then-commit} workflow
|
|
% discussed in Section~\ref{ss:contextanalysis}.
|
|
% GitHub can make it a platform-supporting workflow and provide corresponding functionalities.
|
|
{\color{hltext}
|
|
% As discussed in Section~\ref{ss:contextanalysis},
|
|
% in current behave-then-report workflow (\ie contributors first finish local work and then submit a pull request for online evaluation),
|
|
% developers are usually \hl{behindhand} aware of others' intention after the work is already finished.
|
|
In order to make it more efficient for developers to maintain awareness of each other,
|
|
we envision a new mechanism called \textit{Claim} which is described as follows.
|
|
For a GitHub issue,
|
|
each developer who is interested in it
|
|
can click the \textit{Claim} button on the issue page to claim that s/he is going to work on the issue.
|
|
The usernames of all claimers of the issue
|
|
are listed together below the \textit{Claim} button.
|
|
Every time the \textit{Claim} button is clicked,
|
|
an empty pull request is automatically generated
|
|
and linked to the claimer's username in the issue claimer list.
|
|
Moreover,
|
|
claimers have a chance to report their plans about how to fix the issue in the input box displayed when the \textit{Claim} button is clicked.
|
|
The reported plans would be used to describe the empty pull request.
|
|
Subsequently,
|
|
claimers perform updates of the empty pull request by submitting codes
|
|
until they produce a complete patch.
|
|
All important updates on the empty pull request,
|
|
\eg new commits pushed,
|
|
would be displayed in the claimer list.
|
|
On the one hand,
|
|
this mechanism makes it more convenient for developers to
|
|
share their intentions and activities through just clicking a button.
|
|
On the other hand,
|
|
developers can efficiently catch and track other developers' intentions and activities
|
|
by checking the issue claimer list.
|
|
}
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Duplicate detection: }
|
|
As contributors complained, e.g.,
|
|
% cmt_id: 2538323
|
|
\textit{``... I wish there has been some automated method to detect pending PR per file basis. This could save lot of work duplicacy. ...''}, or
|
|
\textit{`` It's strange that GitHub isn't complaining about this, because it's an exact dup of \#5131 which was merged already''},
|
|
an automatic detection tool of duplicates is missing in GitHub.
|
|
Such a tool can help integrators detect duplicates in a timely manner
|
|
and prevent them spending resources on the redundant effort of evaluating duplicates separately.
|
|
Therefore,
|
|
GitHub can learn from Stack Overflow and Bugzilla to recommend similar work when developers are creating pull requests by utilizing various similarity measures, \eg title and code changes.
|
|
The features discussed in Section~\ref{ss:metrics} can also be integrated to enhance the recommendation system.
|
|
|
|
|
|
{\color{hltext}
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Reference reminder:}
|
|
Since developers might overlook linked pull requests to issues
|
|
({\color{hltext}
|
|
{\footnotesize Section~\ref{ss:contextanalysis}: \textbf{\textit{Overlooking linked pull requests}}}}),
|
|
platforms can actively remind developers of existing pull requests linked to the same issue at pull request submission time.
|
|
The goal of this functionality is similar to that of the duplicate detection tool,
|
|
However, it can be implemented in a more straightforward way.
|
|
For example,
|
|
when developers are filling a pull request and adding a reference to the associated GitHub issue,
|
|
a pop-up box can be displayed next to the issue reference
|
|
to list the existing pull requests linked to that issue.
|
|
}
|
|
|
|
|
|
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Duplicate comparison: }
|
|
As discussed in Section~\ref{rq3},
|
|
when integrators make a choice between duplicate pull requests,
|
|
they consider several factors.
|
|
Platforms can support duplicate comparison to make the selection process more efficient.
|
|
For example,
|
|
platforms can automatically extract several features of compared duplicates,
|
|
\eg inclusion of test codes and the contributor's experience,
|
|
and display these features in a comparison format to
|
|
clearly show the difference between duplicate pull requests
|
|
and speed up the selection process.
|
|
|
|
|
|
{\color{hltext}
|
|
|
|
\vspace{0.5em}
|
|
\noindent \textbf{Online incorporation: }
|
|
{\color{hltext} As presented in Section~\ref{beyond},
|
|
integrators might promote the incorporation of duplicate pull requests.}
|
|
Currently,
|
|
the typical way to incorporate a pull request $PR_{i}$ into another pull request $PR_{j}$ is as follows:
|
|
\textit{i)}
|
|
adding the head branch of $PR_{i}$ as a remote branch in the corresponding local repository of $PR_{j}$,
|
|
\textit{ii)}
|
|
fetching the remote branch to the local repository,
|
|
\textit{iii)}
|
|
cherry-picking the needed commits from or rebase onto the remote branch,
|
|
and
|
|
\textit{iv)}
|
|
updating $PR_{j}$ by synchronizing the changes from local repository to the head branch of $PR_{i}$.
|
|
% Sometimes the author of $PR_{j}$ might be asked to
|
|
% add the author name of $PR_{i}$ in the latest commit message or project changelog to give credit
|
|
% for the incorporated codes.
|
|
Sometimes developers also need to update the commit message or project changelog
|
|
to give credit for the incorporated codes.
|
|
This process can be too complex for newcomer to undertake.
|
|
Moreover,
|
|
this process looks tedious for incorporating trivial changes.
|
|
GitHub can support online incorporation of duplicates pull requests.
|
|
For example,
|
|
it can allow developers to pick the needed codes by clicking buttons in the UI,
|
|
and the credit is given to the picked codes
|
|
by automatically updating the commit message and changelog.
|
|
% Therefore,
|
|
% GitHub can support online incorporation of pull requests
|
|
% so that developers can easily pick the needed codes from the displayed diff snippets.
|
|
% Furthermore,
|
|
% the incorporated codes can be squashed into a separate commit to make it more effective for credit record.
|
|
}
|
|
|