Articles

Found 16 Documents
Search

Model dan Metoda Arsitektur pada Sistem Tanya Jawab Medis suwarningsih, wiwin; Supriana, Iping; Purwarianti, Ayu
INKOM Journal Vol 8, No 2 (2014)
Publisher : Pusat Penelitian Informatika - LIPI

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (808.565 KB) | DOI: 10.14203/j.inkom.303

Abstract

Pada makalah ini, akan dilakukan survey beberapa penelitian yang membahas mengenai sistem tanya jawab dengan domain pada bidang medis (medical question answering = MedQuAn). Sistem MedQuAn mengolah pertanyaan yang diajukan dalam bentuk teks bahasa alami dan kemudian sistem akan memberikan jawaban yang relevan. Makalah ini mencoba menelaah modul konseptual MedQuAn, bahwa sistem tanya jawab terdiri dari tiga komponen inti yang berbeda beserta metoda/ pendekatan yang digunakan. Ketiga komponen inti tersebut adalah klasifikasi pertanyaan, pencarian dokumen, dan ekstraksi jawaban. Hasil akhir dari survey ini adalah sebuah kontribusi untuk pengembangan penelitian di masa mendatang di domain MedQuAn khususnya untuk sistem tanya jawab medis dengan menggunakan bahasa Indonesia.
Sebuah Survey: Tingkat Kepercayaan Pengguna Terhadap Informasi di Sosial Media Pramiyati, Titin; Supriana, Iping; Purwarianti, Ayu
Jurnal Sistem Informasi Vol 7, No 1 (2015): April
Publisher : Major of Information Systems Faculty of Computer Science Sriwijaya University

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Abstract Information trustworthiness can be obtained based on the confidence level (trust) or reputation of the source of information. Nowadays, most people use information derived from social media, however finding reliable source of information can be troublesome. This paper discusses the results of determining the level of trust of certain information presented in social media. The media used as the source of information in this research were Facebook, Google+, Twitter, and LinkedIn. This research is a descriptive study, which is used to recognize behavior of social media users toward the trust level of the sources of information. Respondents involved in this study were divided into two clusters: Civilians and Military officers to seek for their opinion in terms of which social media that have trustworthy information. Data used to support this research was gathered through administering a survey. Survey distribution process was conducted by creating personally-administered questionnaire survey questions distributed directly to respondents. This kind of survey is quite sufficient for a limited survey purpose. Confidence level was measured using graphical and numerical measurements, and equipped with a chi-squared test hypothesis. Based on data analysis process, it was found that Twitter and Google+ chosen to be the most trustworthy source of information. Key word : information trust level;  graphical measurement; numerical measurement; chi-squared test hypothesis Abstrak Informasi yang dipercaya dapat diperoleh berdasarkan pada kepercayaan yang dimiliki oleh sumber informasi atau reputasi sumber informasi. Saat ini, banyak pengguna informasi menggunakan informasi yang berasal dari sosial media, akan tetapi mendapatkan informasi yang sumber informasinya dapat dipercaya masih belum diketahui. Paper ini membahas hasil penentuan tingkat kepercayaan informasi yang terdapat pada media sosial.  Media sosial yang digunakan sebagai sumber informasi pada penelitian ini adalah Facebook, Google+, Twitter, and LinkedIn. Penelitian ini merupakan penelitian deskriptif untuk mengetahui perilaku pengguna sosial media terhadap tingkat kepercayaan sumber informasi. Responden yang terlibat dalam penelitian ini dibagi dua kelompok yaitu kelompok Sipil dan kelompok Militer, untuk mendapatkan pilihan atas media sosial dengan informasi yang dapat dipercaya..  Data yang digunakan untuk mendukung penelitian ini diperoleh melalui survey. Penyebaran survey dilakukan dengan menggunakan pertanyaan yang dibuat sendiri sesuai dengan kebutuhan penelitian dan langsung disebar kepada responden. Survey ini cukup baik untuk survey yang terbatas. Tingka kepercayaan sosial media yang diberikan oleh pengguna menggunakan pengukuran grafis dan numerik, serta dilengkapi dengan uji hipotesis chi-kuadrat. Berdasarkan proses analisa data yang dilakukan, diperoleh bahwa media sosial Twitter dan Google+ adalah sumber informasi yang dipercaya. Kata kunci : tingkat kepercayaan informasi;  pengukuran grafis; pengukuran numerik; uji hipotesa chi-kuadrat
Tantangan dan Peluang pada Question Generation Suwarningsih, Wiwin; Supriana, Iping; Purwarianti, Ayu
Jurnal Sistem Informasi Vol 6, No 2 (2014): Oktober
Publisher : Major of Information Systems Faculty of Computer Science Sriwijaya University

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Abstrak Pada makalah ini, kami melakukan survey beberapa penelitian yang membahas mengenai question generation (QG). QG adalah sebuah teknik untuk membangkitkan pertanyaan yang berasal dari sebuah kalimat atau teks dalam bentuk bahasa alami. Kami mencoba menelaah garis besar konseptual question generation yang terdiri dari tiga kategori yaitu : berbasis sintaks, berbasis semantik, dan berbasis template. Sistem question generation dalam kategori sintaksis sering menggunakan unsur semantik dan sebaliknya. Sedangkan sistem yang berbasis template menggunakan beberapa tingkat sintaksis dan/atau informasi semantik. Hasil akhir dari survey ini adalah sebuah review berupa tantangan dan peluang dalam pengembangan penelitian di masa mendatang, yaitu berupa : (a) Tantangan pada isu semantik leksikal dan sintaktik, (b) penggunaan alternatif segitiga Vauquois, shallow parser dan (c) representasi sintaksis dengan struktur pohon frasa.Kata kunci : question generation, leksikal, sintaksis, transformasi kalimat, segitiga Vauquois.Abstract In this paper, we reviewed the current state of the art in the question generation (QG). Question Generation (QG) is the task of generating reasonable questions from a text or sentence of natural language. We attempted to examine the question of conceptual outline generation consisting of three categories: Syntax based, semantic-based and template-based. Question generation system in the syntactic category often uses semantic elements and vice versa. While the template-based system using multiple levels of syntactic and / or semantic information. The final results of this survey is a review in the form of challenges and opportunities in the development of future research, which are: (a) challenge on the issue of lexical semantic and syntactic, (b) the use of alternative Vauquois triangular, shallow parser, and (c) the syntactic representation phrase structure tree.Key word : question generation, leksikal, sintaksis, transformasi kalimat, segitiga Vauquois
A Novel Part-of-Speech Set Developing Method for Statistical Machine Translation Sujaini, Herry; Kuspriyanto, Kuspriyanto; Akhmad Arman, Arry; Purwarianti, Ayu
TELKOMNIKA Telecommunication, Computing, Electronics and Control Vol 12, No 2: June 2014
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Part of speech (PoS) is one of the features that can be used to improve the quality of statistical-based machine translation. Typically, the language PoS determined based grammar of the language or adopt from other languages PoS. This work aims to formulate a model to developing PoS as linguistic factors to improve the quality of machine translation automatically. The research method using word similarity approach, where we perform clustering of the words contained in a corpus. Further classes will be defined as PoS set obtained for a given language.We evaluated the results of the PoS that defined computational results using machine translation system MOSES as the system by comparing the results of the SMT are using PoS sets generated manually, while the assessment of the system using BLEU method. Language that will be used for evaluation is English as the source language and Indonesian as the target language.
A Novel Part-of-Speech Set Developing Method for Statistical Machine Translation Sujaini, Herry; Kuspriyanto, Kuspriyanto; Akhmad Arman, Arry; Purwarianti, Ayu
TELKOMNIKA Telecommunication, Computing, Electronics and Control Vol 12, No 2: June 2014
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Part of speech (PoS) is one of the features that can be used to improve the quality of statistical-based machine translation. Typically, the language PoS determined based grammar of the language or adopt from other languages PoS. This work aims to formulate a model to developing PoS as linguistic factors to improve the quality of machine translation automatically. The research method using word similarity approach, where we perform clustering of the words contained in a corpus. Further classes will be defined as PoS set obtained for a given language.We evaluated the results of the PoS that defined computational results using machine translation system MOSES as the system by comparing the results of the SMT are using PoS sets generated manually, while the assessment of the system using BLEU method. Language that will be used for evaluation is English as the source language and Indonesian as the target language.
Detailed Analysis of Extrinsic Plagiarism Detection System Using Machine Learning Approach (Naive Bayes and SVM) Alfikri, Zakiy Firdaus; Purwarianti, Ayu
TELKOMNIKA Indonesian Journal of Electrical Engineering Vol 12, No 11: November 2014
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/telkomnika.v12i11.6652

Abstract

In this report we proposed a detailed analysis method of plagiarism detection system using machine learning approach. We used Naive Bayes and Support Vector Machine (SVM) as learning algorithms. Learning features used in the method are words similarity, fingerprints similarity, latent semantic analysis (LSA) similarity, and word pair. The purpose in selecting those features is to retrieve information from the state-of-the-art detailed analysis methods (words similarity, fingerprinting, and LSA) in order to integrate the strength of each method in detecting plagiarism. Several experiments were conducted to test the performance of the proposed method in detecting many cases of plagiarism. The experiments used data test that contains cases of literal plagiarism, partial literal plagiarism, paraphrased plagiarism, plagiarism with changed sentence structure, and translated plagiarism. The data test also contains cases of non-plagiarism of different topics and non-plagiarism of the same topic. The results obtained in experiments using SVM showed an average accuracy of 92.86% (reaching 95.71% without using words similarity feature). While the result obtained using Naive Bayes showed an average accuracy of 54.29% (reaching 84.29% without using the word pair features).
Rhetorical Sentences Classification Based on Section Class and Title of Paper for Experimental Technical Papers Helen, Afrida; Purwarianti, Ayu; Widyantoro, Dwi H.
Journal of ICT Research and Applications Vol 9, No 3 (2015)
Publisher : ITB Journal Publisher, LPPM ITB

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (246.844 KB) | DOI: 10.5614/itbj.ict.res.appl.2015.9.3.5

Abstract

Rhetorical sentence classification is an interesting approach for making extractive summaries but this technique still needs to be developed because the performance of automatic rhetorical sentence classification is still poor. Rhetorical sentences are sentences that contain rhetorical words or phrases. Rhetorical sentences not only appear in the contents of a paper but also in the title. In this study, features related to section class and title class that have been proposed in a previous research were further developed. Our method uses different techniques to reach automatic section class extraction for which we introduce new, format-based features. Furthermore, we propose automatic rhetoric phrase extraction from the title. The corpus we used was a collection of technical-experimental scientific papers. Our method uses the Support Vector Machine (SVM) algorithm and the Naïve Bayesian algorithm for classification. The four categories used were: Problem, Method, Data, and Result. It was hypothesized that these features would be able to improve classification accuracy compared to previous methods. The F-measure for these categories reached up to 14%. 
Supervised Entity Tagger for Indonesian Labor Strike Tweets using Oversampling Technique and Low Resource Features Purwarianti, Ayu; Madlberger, Lisa; Ibrahim, Mochammad
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 14, No 4: December 2016
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (129.777 KB) | DOI: 10.12928/telkomnika.v14i4.3876

Abstract

We propose an entity tagger for Indonesian tweets sent during labor strike events using supervised learning methods. The aim of the tagger is to extract the date, location and the person/organization involved in the strike. We use SMOTE (Synthetic Minority Oversampling Technique) as an oversampling technique and conducted several experiments using Twitter data to evaluate different settings with varying machine learning algorithms and training data sizes. In order to test the low resource features, we also conducted experiments for the system without employing the word list feature and the word normalization. Our results indicated that different treatment of different types of machine learning algorithms with low resource features can lead to a good accuracy score. Here, we tried Naïve Bayes, C4.5, Random Forest and SMO (Sequential Minimal Optimization) algorithms using Weka as the machine learning tools. For the Naïve Bayes, due to the data distribution based of the class probability, the best accuracy was achieved by removing data duplication. For C4.5 and Random Forest, SMOTE gave higher accuracy result compared to the original data and the data with data duplication removal. For SMO, there is no significant difference among various sizes of training data.
PENILAIAN ESAI JAWABAN BAHASA INDONESIA MENGGUNAKAN METODE SVM - LSA DENGAN FITUR GENERIK Adhitia, Rama; Purwarianti, Ayu
Jurnal Sistem Informasi Vol 5 No 1 (2009): Jurnal Sistem Informasi (Journal of Information System)
Publisher : Faculty of Computer Science Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (845.401 KB) | DOI: 10.21609/jsi.v5i1.260

Abstract

Paper ini mengkaji sebuah solusi untuk permasalahan penilaian jawaban esai secara otomatis dengan menggabungkan support vector machine (SVM) sebagai teknik klasifikasi teks otomatis dengan LSA sebagai usaha untuk menangani sinonim dan polisemi antar index term. Berbeda dengan sistem penilaian esai yang biasa yakni fitur yang digunakan berupa index term, fitur yang digunakan proses penilaian jawaban esai adalah berupa fitur generic yang memungkinkan pengujian model penilaian esai untuk berbagai pertanyaan yang berbeda. Dengan menggunakan fitur generic ini, seseorang tidak perlu melakukan pelatihan ulang jika orang tersebut akan melakukan penilaian esai jawaban untuk beberapa pertanyaan. Fitur yang dimaksud meliputi persentase kemunculan kata kunci, similarity jawaban esai dengan jawaban referensi, persentase kemunculan gagasan kunci, persentase kemunculan gagasan salah, serta persentase kemunculan sinonim kata kunci. Hasil pengujian juga memperlihatkan bahwa metode yang diusulkan mempunyai tingkat akurasi penilaian yang lebih tinggi jika dibandingkan dengan metode lain seperti SVM atau LSA menggunakan index term sebagai fitur pembelajaran mesin. This paper examines a solution for problems of assessment an essay answers automatically by combining support vector machine (SVM) as automatic text classification techniques and LSA as an attempt to deal with synonyms and the polysemy between index terms. Unlike the usual essay scoring system that used index terms features, the feature used for the essay answers assessment process is a generic feature which allows testing of valuation models essays for a variety of different questions. By using these generic features, one does not need to re training if the person will conduct an assessment essay answers to some questions. The features include percentage of keywords, similarity essay answers with the answer reference, percentage of key ideas, percentage of wrong answer, and percentage of keyword synonyms. The test results also show that the proposed method has a higher valuation accuracy rate compared to other methods such as SVM or LSA, use term index as features in machine learning.
IMPLEMENTASI PENDIKTEAN BAHASA INDONESIA Purwarianti, Ayu; Firdaud, Hari Bagus
Jurnal Ilmu Komputer dan Informasi Vol 4, No 1 (2011): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.21609/jiki.v4i1.152

Abstract

Paper ini memaparkan hasil penelitian dalam membangun aplikasi pendiktean Bahasa Indonesia untuk waktu nyata. Dalam membangun sebuah aplikasi pendiktean, terdapat beberapa masalah seperti perintah suara (voice command), Out Of Vocabulary (OOV), noise, dan filler. Adapun yang menjadi fokus dalam penelitian ini adalah penanganan perintah suara dan OOV dari kata yang didiktekan. Pendiktean suara merupakan pengembangan lanjut dari pengenalan suara secara waktu nyata dengan tambahan metode untuk menangani hal-hal yang telah dinyatakan sebelumnya. Untuk menangani perintah suara, sebuah modul ditambahkan untuk mengecek hasil decoding dari sistem pengenalan suara. Adapun untuk menangani OOV, ditambahkan modul penanganan pengejaan setelah sebelumnya dinyatakan status ejaan. Model perintah suara dan model huruf ditambahkan ke dalam kamus dan digunakan sebagai pelatihan dari model bahasa n-gram. Dalam pengujian, dilakukan evaluasi terhadap sistem pengenalan suara, penanganan perintah suara, dan modul pengejaan sebagai strategi untuk menangani kata OOV. Untuk modul pengenalan suara, akurasi yang dicapai adalah 70%. Untuk modul penanganan perintah suara, pengujian menunjukkan bahwa perintah suara dapat ditangani dengan baik. Sedangkan untuk modul pengejaan, pengujian menunjukkan bahwa hanya 20 dari 26 huruf yang berhasil dikenali. In this paper, we presented the results of research in building applications dictation of the Bahasa Indonesia for real-time. In developing a dictation application, there are some problems such as voice command, Out of Vocabulary (OOV), noise, and filler. As the focus in this research is the handling of voice command and OOV from dictated words. Voice dictation is a further development of real time voice recognition with an additional method to deal with things that have been stated before. To handle voice commands, a module is added to check the results of decoding of the voice recognition system. To handle OOV, spelling handling module is added after the previously stated spelling status. Voice command model and the model letter are added to the dictionary and used as the training of n-gram language model. In testing, we conducted an evaluation of speech recognition systems, voice commands and spelling handling module as a strategy to deal with OOV words. For the speech recognition module, the achieved accuracy is 70%. For voice commands handling module, the test showed that voice commands can be handled properly. As for the spelling module, testing showed that only 20 of the 26 letters that successfully recognized.