The Approach Of The Software Artifact Summarization Paradigm
1853 Words8 Pages
Our approach consists of following steps. First, we collected a corpus of code fragments, containing 127 code fragments, extracted directly from the Eclipse and NetBeans Official FAQs. Second, we hired human annotators to suggest summary lines i.e., Gold Summary Lines (annotation). Third, we introduced crowdsourcing (data-driven) as a problem solving model in software artifact summarization paradigm, as it had not been employed before for software artifact summarization, for extracting code features. Fourth, we trained two classifiers namely Support Vector Machines (SVM) and Naive Bayes (NB) on code fragments. Next, we evaluated the effectiveness of these classifiers on different statistical measures such as Accuracy, Precision, Recall, F-Score, True Positive Rate (TPR), False Positive Rate (FPR), Receiver Operator Characteristic (ROC), and Area under curve (AUC). In the end, we performed feature selection analysis to rank and determine the importance of selected features. In the sections below, we discuss these steps one by one.
5.2.1 Corpus Creation We need a corpus of code fragments to train and judge the effectiveness of our classifiers. Optimally, such a corpus would have been made available before or created by experts in the field of code fragment summarization. Previously, Ying et al.  have collected a corpus of code fragments containing 70 code fragments from Eclipse Official FAQ. Since last year, additional 8 code fragments have been added to Eclipse FAQ,