Dwi H. Widyantoro
Department of Informatics, School of Electrical Engineering and Informatics (STEI) Bandung Institute of Technology, Jalan Ganesha 10, Bandung, 40132

Published : 2 Documents

Found 2 Documents

A Multiclass-based Classification Strategy for Rethorical Sentence Categorization from Scientific Papers Widyantoro, Dwi H.; Khodra, Masayu L.; Trilaksono, Bambang Riyanto; Aziz, E. Aminudin
Journal of ICT Research and Applications Vol 7, No 3 (2013)
Publisher : ITB Journal Publisher, LPPM ITB

Show Abstract | Original Source | Check in Google Scholar | Full PDF (122.999 KB)


Rapid identification of content structures in a scientific paper is of great importance particularly for those who actively engage in frontier research. This paper presents a multi-classifier approach to identify such structures in terms of classification of rhetorical sentences in scientific papers. The idea behind this approach is based on an observation that no single classifier is the best performer for classifying all rhetorical categories of sentences. Therefore, our approach learns which classifiers are good at what categories, assign the classifiers for those categories and apply only the right classifier for classifying a given category. This paper employsk-fold cross validation over training data to obtain the category-classifier mapping and then re-learn the classification model of the corresponding classifier using full training data on that particular category. This approach has been evaluated for identifying sixteen different rhetorical categories on sentences collected from ACL-ARC paper collection. The experimental results show that the multi-classifier approach can significantly improve the classification performance over multi-label classifiers.
Rhetorical Sentences Classification Based on Section Class and Title of Paper for Experimental Technical Papers Helen, Afrida; Purwarianti, Ayu; Widyantoro, Dwi H.
Journal of ICT Research and Applications Vol 9, No 3 (2015)
Publisher : ITB Journal Publisher, LPPM ITB

Show Abstract | Original Source | Check in Google Scholar | Full PDF (246.844 KB)


Rhetorical sentence classification is an interesting approach for making extractive summaries but this technique still needs to be developed because the performance of automatic rhetorical sentence classification is still poor. Rhetorical sentences are sentences that contain rhetorical words or phrases. Rhetorical sentences not only appear in the contents of a paper but also in the title. In this study, features related to section class and title class that have been proposed in a previous research were further developed. Our method uses different techniques to reach automatic section class extraction for which we introduce new, format-based features. Furthermore, we propose automatic rhetoric phrase extraction from the title. The corpus we used was a collection of technical-experimental scientific papers. Our method uses the Support Vector Machine (SVM) algorithm and the Naïve Bayesian algorithm for classification. The four categories used were: Problem, Method, Data, and Result. It was hypothesized that these features would be able to improve classification accuracy compared to previous methods. The F-measure for these categories reached up to 14%.