Articles

Design And Implementation of Document Similarity Search System For WEB-Based Medical Journal Management

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 5, No 1 (2011): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (137.857 KB)

Abstract

Abstract— Document similarity can be used as a reference for other information searches similar. So as to reduce the time-re-appointment for information following a similar document. Document similarity search capability is usually implemented on the features related articles.Similarity of documents can be measured with a cosine, with preprosesing conducted prior to the document that will be measured. The indexing process and the measurement takes a relatively long excecution time. Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.The purpose of this research is to design and create a software that give capability for web-based database management system of medical journals in Indonesian language to find other documents similar to the current document in reading at the time.The results of this research is the mechanism autoreload javascript and session cookies and can break down the process and measurement index similaritas into several small sections, so the process can be performed on web-based applications and the number of relatively large documents.Results with the cosine similarity measure in the case of Indonesian-language medical journal “Media medika Indonesiana” has a fairly high accuracy of 90%. Keywords— document similarity, cosine measure, web-based application.

Klasifikasi Posting Twitter Kemacetan Lalu Lintas Kota Bandung Menggunakan Naive Bayesian Classification

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 6, No 1 (2012): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (1157.557 KB)

Abstract

AbstrakSetiap hari server Twitter menerima data tweet dengan jumlah yang sangat besar, dengan demikian, kita dapat melakukan data mining yang digunakan untuk tujuan tertentu. Salah satunya adalah untuk visualisasi kemacetan lalu lintas di sebuah kota.Naive bayes classifier adalah pendekatan yang mengacu pada teorema Bayes, dengan mengkombinasikan pengetahuan sebelumnya dengan pengetahuan baru. Sehingga merupakan salah satu algoritma klasifikasi yang sederhana namun memiliki akurasi tinggi. Untuk itu, dalam penelitian ini akan membuktikan kemampuan naive bayes classifier untuk mengklasifikasikan tweet yang berisi informasi dari kemacetan lalu lintas di Bandung.Dari hasil uji coba, aplikasi menunjukan bahwa nilai akurasi terkecil 78% dihasilkan pada pengujian dengan sampel sebanyak 100 dan menghasilkan nilai akurasi tinggi 91,60% pada pengujian dengan sampel sebanyak 13106. Hasil pengujian dengan perangkat lunak Rapid Miner 5.1 diperoleh nilai akurasi terkecil 72% dengan sampel sebanyak 100 dan nilai akurasi tertinggi 93,58% dengan sampel 13106 untuk metode naive bayesian classification. Sedangkan untuk metode support vector machine diperoleh nilai akurasi terkecil 92%  dengan sampel sebanyak 100 dan nilai akurasi tertinggi 99,11% dengan sampel sebanyak 13106. Kata kunci— Twitter, tweet, klasifikasi, naive bayesian classification, support vector machine AbstractEvery day the Twitter server receives data tweet with a very large number, thus, we can perform data mining to be used for specific purpose. One of which is for the visualization of traffic jam in a city.Naive bayes classifier is an approach that refers to the bayes theorem, is a combination of prior knowledge with new knowledge. So that is one of the classification algorithm is simple but has a high accuracy. With this, in this research will prove the ability naive bayes classifier to classify the tweet that contains information of traffic jam in Bandung.The testing result, the program shows that the smallest value of the accuracy is 78% on testing by using a sample 100 record and generate high accuracy is 91,60% on the testing by using a sample 13106 record. The testing results with Rapid Miner 5.1 software obtained the smallest value of the accuracy is 72% by using a sample 100 records and the high accuracy is 93.58%  by using a sample 13.106 records for naive bayesian classification. And for the method of support vector machine obtained the smallest value is 92% accuracy by using a sample 100 records and the high accuracy of 99.11% by using a sample 13.106 records. Keywords—Twitter, tweet, classification, naive bayesian classification, support vector machine

Optimal Solution Of Minmax 0/1 Knapsack Problem Using Dynamic Programming

International Journal of Informatics and Communication Technology (IJ-ICT) Vol 2, No 1 (2013)
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Original Source | Check in Google Scholar | Full PDF (704.037 KB)

Abstract

Knapsack problem is a problem that occurs when looking for optimal selection of objects that will be put into a container with limited space and capacity. On the issue of loading goods into the container, optimal selection of objects or items to be sent must fulfilled to minimize the total weight of the capacity or volume limits without exceeding the maximum capacity of containers that have been determined. The types of knapsack that has been discussed so far is only to maximize the use not to exceed the limits specified capacity so it cannot be applied to the problem. This study aims to develop a dynamic programming algorithm to solve the MinMax 0/1 knapsack, which is an extension of the 0/1 knapsack with minimal and maximal constrain.  The result study showed that application of the MinMax 0/1 knapsack is used to generate the optimal solution to the problem of loading system goods into the container to optimize container space available compared with the loading of goods by PT DFI.

Payload Attribution Using Winnowing Multi Hashing Method

International Journal of Information and Network Security (IJINS) Vol 2, No 5 (2013)
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Original Source | Check in Google Scholar | Full PDF (743.434 KB)

Abstract

Payload attribution is a process to identify the sources and destinations of all packets that appeared on a network and a certain excerpt of a payload. This method can be used for traffic efficiencies in investigating internet crime (cybercrime), such as tracing who is responsible for activities for unauthorized access, illegal contents, deliberate spread of the virus, data forgery and any cybercrime. The payload is the actual data that is sent by the packet to the destination. The aim using Winnowing Multi Hashing Method (WMH) is to extract the payload by calculating the value of false positive. A low false positive values ​​in theWMH will be recommended to the reference value of the block boundary or window hash. This method can be used as a solution for addressing the problems of storage media size required on the network forensic activity

Analisis Fitur Kalimat untuk Peringkas Teks Otomatis pada Bahasa Indonesia

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 5, No 2 (2011): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (185.817 KB)

Abstract

Abstract— Automatic Text Summarization (ATS) is a technique to create a summary of the document automatically by using computer applications to produce the most important information from the original document. Features are required to perform weighting of sentences, including Log-TFISF (term frequency index sentence frequency), sentence location, sentence overlap, title overlap and sentence relative length. This research conducted an analysis of five features in order to determine the weights of each feature that will get the results of a coherent summary. The five features are implemented in automated text summarization system in Indonesian language that was developed using the method of relative importance of topics. Results from experiments show that sentence location feature has the highest F-Measures namely 0.46 and then consecutive sentence overlap, title overlap, sentence relative length and Log-TFISF, with a value of 0.42, 0.42, 0.35 and 0.32. Relative weights of feature extraction consecutive from the largest are sentence location, sentence overlap, title overlap, sentence relative length and Log-TFISF with a value of 0.25, 0.22, 0.22, 0.19 and 0.12. These relative weights are implemented on ATS, so we get accuracy of 70.62%. It is more accurate 2,86% than without relative weights which accuracy of 67,72%.. .Keywords— Automatic Text Summarization (ATS), Log-TFISF, sentence location, sentence overlap, title overlap, sentence relative length, bahasa Indonesia

KLASTERING DOKUMEN MENGGUNAKAN HIERARCHICAL AGGLOMERATIVE CLUSTERING : Prosiding Seminar Nasional Sistem dan Teknologi Informasi (SNASTI) 2010

Publikasi Eksternal 2010
Publisher : Publikasi Eksternal

Show Abstract | Original Source | Check in Google Scholar

Abstract

Document retrieval process stored in document database often produces very large numbers of documents. And many documents are available is not relevant to the desired document. Clustering the documents in database before retrieval is one way to find relevant documents.This study attempted to document be clustered using Agglomerative Hierarchical Clustering Algorithms. It emphasized clustering to documents written in Indonesian, because today, the needs of users in the homeland of information is increasing. The relationship between documents can be measured by the similarity between the documents (similarity).This algorithm was tested by using the documents from UII SNATI publications from 2004-2009. The experimental results show that this algorithm can be applied to group documents written in Indonesian. The selection of appropriate keywords will increase the quality of information retrieval to the document. This quality is reflected in the recall rates 0.6 and 0.5 precision. Disampaikan di Seminar Nasional Sistem dan Teknologi Informasi(SNASTI) , 10 Desember 2010

Penerapan Metode Support Vector Machine pada Sistem Deteksi Intrusi secara Real-time

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 8, No 1 (2014): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (575.955 KB)

Abstract

AbstrakSistem deteksi intrusi adalah sebuah sistem yang dapat mendeteksi serangan atau intrusi dalam sebuah jaringan atau sistem komputer, umum pendeteksian intrusi dilakukan dengan membandingkan pola lalu lintas jaringan dengan pola serangan yang diketahui atau mencari pola tidak normal dari lalu lintas jaringan. Pertumbuhan aktivitas internet meningkatkan jumlah paket data yang harus dianalisis untuk membangun pola serangan ataupun normal, situasi ini menyebabkan kemungkinan bahwa sistem tidak dapat mendeteksi serangan dengan teknik yang baru, sehingga dibutuhkan sebuah sistem yang dapat membangun pola atau model secara otomatis.Penelitian ini memiliki tujuan untuk membangun sistem deteksi intrusi dengan kemampuan membuat sebuah model secara otomatis dan dapat mendeteksi intrusi dalam lingkungan real-time, dengan menggunakan metode support vector machine sebagai salah satu metode data mining untuk mengklasifikasikan audit data lalu lintas jaringan dalam 3 kelas, yaitu: normal, probe, dan DoS. Data audit dibuat dari preprocessing rekaman paket data jaringan yang dihasilkan oleh Tshark.Berdasar hasil pengujian, sistem dapat membantu sistem administrator untuk membangun model atau pola secara otomatis dengan tingkat akurasi dan deteksi serangan yang tinggi serta tingkat false positive yang rendah. Sistem juga dapat berjalan pada lingkungan real-time. Kata kunci— deteksi intrusi, klasifikasi, preprocessing, support vector machine  AbstractIntrusion detection system is a system  for detecting attacks or intrusions in a network or computer system, generally intrusion detection is done with comparing network traffic pattern with known attack pattern or with finding unnormal pattern of network traffic. The raise of internet activity has increase the number of packet data that must be analyzed for build the attack or normal pattern, this situation led to the possibility that the system can not detect the intrusion with a new technique, so it needs a system that can automaticaly build a pattern or model.This research have a goal to build an intrusion detection system with ability to create a model automaticaly and can detect the intrusion in real-time environment with using support vector machine method as a one of data mining method for classifying network traffic audit data in 3 classes, namely: normal, probe, and DoS. Audit data was established from preprocessing of network packet capture files that obtained from Tshark. Based on the test result, the system can help system administrator to build a model or pattern automaticaly with high accuracy, high attack detection rate, and low false positive rate. The system also can run in real-time environment. Keywords— intrusion detection, classification, preprocessing, support vector machine

Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 8, No 1 (2014): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (547.802 KB)

Abstract

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.

Peramalan KLBCampakMenggunakanGabunganMetode JST Backpropagationdan CART

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 8, No 1 (2014): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (489.727 KB)

Abstract

AbstrakPeramalanKejadianLuarbiasa (KLB) Campakpadasuatudaerahdiperlukankarenauntukmencegahmeluasnyakejadian di suatudaerah. Salah satucara yang dilakukandalampenelitianiniadalahmemprediksikejadiancampakmenggunakankombinasi JST backpropagationdan CART. JST backpropagationdigunakanuntukmemprediksi data berkalakejadiancampak, kemudianmetode CART digunakanuntukmelakukanpenentuan KLB atau non KLB suatudaerah.JST backpropagationmerupakansalahsatumetode yang seringdigunakanuntukperamalan yang dapatmenghasilkantingkatakurasi yang lebihbaikdarimetode JST yang lain. Sedangkanmetode  CARTmerupakansuatumetodepohonbiner yang jugapopuleruntukmelakukanklasifikasi, yang dapatmenghasilkan model atauaturanklasifikasi. Hasilpenelitianiniadalahjumlah window terbaikuntukmelakukanperamalan JST backpropagation yang mempengaruhihasilakurasiperamalan.Penentuanjumlah window darisuatuperamalan JST backpropagationpadasetiapatributberbeda-bedahasilnyadanberpengaruhsecaralangsungterhadaphasilperamalan. JST mampumelakukanpreramalan data time series denganakurasi 90.01%, sedangkan CART mampumenentukandaerah KLB atau non KLB dengantingkatakurasisebesar 83.33%. Kata kunci—KLB, Campak, Peramalan, Backprpagation, CART  Abstract Forecasting Measles Outbreak  in an area is necessary because to prevent widespread occurrence in an area. One way that is done in this study is to predict the incidence of measles by using a combination of backpropagation ANN and CART. Backpropagation ANN is used to predict the incidence of measles periodic data, then the CART method used to perform the determination of an outbreak or non-outbreak area.Backpropagation neural network is one of the most commonly used methods for forecasting which can result in a better level of accuracy than other ANN methods. While the methods of CART is a binary tree method is also popular for the classification, which can produce models or classification rules.Results of this study show that the number of the best window for backpropagation neural network to forecast the outcome affect forecasting accuracy. Determination of the number of windows of a backpropagation neural network forecasting on each attribute gives different results and directly affects the forecasting results. ANN can do the forecasting in time series using siliding window with accuracy 90.01% and then CART method can be use for classification with accuracy 83.33%. Keywords—KLB, Measles, Forecasting, Backprpagation, CART

Pengelompokan Berita Indonesia Berdasarkan Histogram Kata Menggunakan Self-Organizing Map

IJCCS - Indonesian Journal of Computing and Cybernetics Systems Vol 8, No 1 (2014): IJCCS
Publisher : Indonesian Computer, Electronics, and Instrumentation Support Society (IndoCEISS)

Show Abstract | Original Source | Check in Google Scholar | Full PDF (752.886 KB)

Abstract

AbstrakBerita merupakan sumber informasi yang dinantikan oleh manusia setiap harinya. Manusia membaca berita dengan kategori yang diinginkan. Jika komputer mampu mengelompokkan berita secara otomatis maka tentunya manusia akan lebih mudah membaca berita sesuai dengan kategori yang diinginkan. Pengelompokan berita yang berupa artikel secara otomatis sangatlah menarik karena mengorganisir artikel berita secara manual membutuhkan waktu dan biaya yang tidak sedikit.Tujuan penelitian ini adalah membuat sistem aplikasi untuk pengelompokkan artikel berita dengan menggunakan algoritma Self Organizing Map. Artikel berita digunakan sebagai input data. Kemudian sistem melakukan pemrosesan data untuk dikelompokkan. Proses yang dilakukan sistem meliputi preprocessing, feature extraction, clustering dan visualize.Sistem yang dikembangkan mampu menampilkan hasil clustering dengan algoritma Self Organizing Map dan memberikan visualisasi dengan smoothed data histograms berupa island map dari artikel berita. Selain itu sistem dapat menampilkan koleksi dokumen dari lima kategori berita yang ada pada tiap tahunnya dan banyaknya kata (histogram kata) yang sering muncul pada tiap arikel berita. Pengujian dari sistem ini dengan memasukan artikel berita, kemudian sistem memprosesnya dan mampu memberikan hasil cluster dari artikel berita yang dimasukan. Kata kunci—Pengelompokkan berita Indonesia, pengelompokkan berdasar histogram kata, pengelompokan berita menggunakan SOM  Abstract News is awaited information resources by humans every day. Human reading the news with the desired category. If the computer able to news clustering with automatically, humans of course will be easier to read the news according to the desired category. News clustering in the form of news articles with automatically very interesting because it organizes news articles manually takes time and costs not a little bit.The purpose of this research is to create a system application for grouping news articles by using the Self Organizing Map algorithm. News article be used as input into the system. News articles used as input data. Then the system performs data processing until to be clustered. Processes performed by the system covers: preprocessing, feature extraction, clustering and visualize.The system developed is able to display the results clustering of the Self Organizing Map algorithm and gives visualization of the Smoothed Data Histograms in the form of island map from news articles. Additionally the system can display a word histogram and news articles from five categories news in each year. Testing of this system by entering the news articles, then the system performs data processing and gives results of a cluster from news articles that input. Keywords—Indonesia news clustering, clustering based on words histograms, news clustering using SOM