inor

2017-06-14 20:27:16 +08:00 · 2017-06-14 20:27:16 +08:00 · b50a5f0a8a
parent 6db3608dea
commit b50a5f0a8a
3 changed files with 30 additions and 18 deletions
--- a/2_background.tex
+++ b/2_background.tex
@ -156,25 +156,32 @@ and the subsequent pull-requests are referred to the \textit{duplicate pull-requ
 \end{figure}


-For example, Figure \ref{fig:example_dup_prs} 
-shows two duplicate pull-requests 
-both of which intend to \hl{[....]}
+For example, Figure~\ref{fig:example_dup_prs} 
+shows a duplicate pull-request (\textit{Rails \#11869}) 
+and its master pull-request (\textit{Rails \#11496}). 
+Both of them intend to resolve the problem of association 
+which is based on null relationship. 
 In \gh, in addition to commits (\ie file changes), 
 contributors also need to provide the summary title and detailed description 
 to elaborate on the submitted pull-request. 

-
-%%%%
-%%% 大部分时候，同一个问题会用相似语句表达，但是具体的用词不同；甚至 同一个 root error 导致不同的failure；
-%%%  
-%%%%
-
-
-
-However, GitHub does not provide an explicit way to 
-mark a pull-request as duplicate to another one. 
-In our study, the test dataset of duplicates is recognized by analysing review comments  
-which is elaborated in Section~\ref{sec:experiment}
+From the figure, we can that the titles and descriptions  
+of these two pull-requests share some same words 
+which means natural language text can be used to measure their similarity. 
+Textual similarity has been actually applied by many precious 
+studies~\cite{Runeson2007Detection,Wang2008,Nguyen2012Duplicate,Lazar2014Improving} 
+to detect duplicate contents in software development (\eg bug reports).
+But it is natural and common that difference of language expression exists
+when different people are descriping the same thing 
+which is just reflected in the above two duplicate pull-requests, 
+that do not have too many same words. 
+However, compared with bug reports, 
+pull-requests contain more information, such as diff of file changes (\ie commits). 
+It is likely that developers will edit a same set of files 
+to fix the same bug or add the same feature. 
+Therefore, except for text information, 
+we also take into consideration of diff information 
+and investigate combining them together to better detect duplicate pull-requests. 

 % \begin{itemize}
 % 	%说明 越晚识别越影响贡献者的持续贡献【这个rq要根据具体的实验数据决定是否添加，如果添加了，那么就要在intro里引出自动识别前提到这个紧迫性】
@ -189,7 +196,6 @@ which is elaborated in Section~\ref{sec:experiment}
 % 用那个时序图来解释 [可以放到方法里讲收集数据的过程]


-
 % !!!!!!!! RQ
 % RQ0: 时间分布
 % RQ1: Title Desc FileD LineD 各自的效果
--- a/3_approach.tex
+++ b/3_approach.tex
@ -10,7 +10,13 @@ We determine the similarity between pull-requests from two perspectives:
 Text similarity is calculated based on the \hl{natural language text}, 
 while diff similarity is calculated by comparing the file changes 
 contained in different pull-requests.
-Finally, the combined similarity will be used to retrieve potential targe pull-requests.
+Finally, the combined similarity will be used to retrieve potential targe pull-requests. 
+
+[However, GitHub does not provide an explicit way to 
+mark a pull-request as duplicate to another one. 
+In our study, the test dataset of duplicates is recognized by analysing review comments  
+which is elaborated in Section~\ref{sec:experiment}]
+
 In the following sections, we will elaborate each step in detail.

 \begin{figure}[ht]
--- a/9_ref.bib
+++ b/9_ref.bib
@ -9,7 +9,7 @@
 }
@inproceedings{Li2017,
  title={Automatic Classification of Review Comments in Pull-based Development Model},
-  author={ Zhixing Li, Yue Yu, Gang Yin, Tao Wang, Qiang Fan and Huaimin Wang},
+  author={Li, Zhixing and Yu, Yue and Yin, Gang and Wang, Tao and Fan,Qiang and Wang,  Huaimin},
  booktitle={International Conference on Software Engineering and Knowledge Engineering},
  year={2017},
 }