Sentimental Analysis on Turkish Blogs via Ensemble Classifier

ŞEKER Ş. E. , Al-Naami K.

PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON DATA MINING, Amerika Birleşik Devletleri, 01 Temmuz 2013, cilt.1, no.1, ss.10-16

  • Cilt numarası: 1
  • Basıldığı Ülke: Amerika Birleşik Devletleri
  • Sayfa Sayıları: ss.10-16


Sentimental analysis on web-mined data has an increasing impact on most of the studies. Sentimental influence of any content on the web is one of the most curios questions by the content creators and publishers. In this study, we have researched the impact of the comments collected from five different web sites in Turkish with more than 2 million comments in total. The web sites are from newspapers; movie reviews, e-marketing web site and a literature web site. We mix all the comments into a single file. The com-ments also have a like or dislike number, which we use as ground proof of the impact of the comment, as the senti-mental of the comment. We try to correlate the text of comment and the like / dislike grade of the proof. We use three classifiers as support vector machine, k-nearest neighborhood and C4.5 decision tree classifier. On top of them, we add an ensemble classifier based on the majority voting. For the feature extraction from the text, we use the term frequency – inverse document frequency approach and limit the top most features depending on their infor-mation gain. The result of study shows that there are about 56% correlation between the blogs and comments and their like / dislike score depending on our classification model.