Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2017-10-26T15:21:19Z
dc.date.available 2017-10-26T15:21:19Z
dc.date.issued 2017
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/63208
dc.description.abstract Abstract—Social media are increasingly being used as sources in mainstream news coverage. However, since news is so rapidly updating it is very easy to fall into the trap of believing everything as truth. Spam content usually refers to the information that goes viral and skews users views on subjects. Despite recent advances in spam analysis methods, it is still a challenging task to extract accurate and useful information from tweets. This paper aims at introducing a new approach for classification of spam and non-spam tweets using Cost-Sensitive Classifier that includes Random Forest. The approach consisted of three phases: preprocessing, classification and evaluation. In the preprocessing phase, tweets were first annotated manually and then four different sets of features were extracted from them. In the classification phase, four machine learning algorithms were first cross-validated aiming at determining the best base classifier for spam detection. Then, class imbalanced problem was dealt by resampling and incorporating arbitrary misclassification costs into the learning process. In the evaluation phase, the trained algorithm was tested with unseen tweets. Experimental results showed that the proposed approach helped mitigate overfitting and reduced classification error by achieving an overall accuracy of 89.14% in training and 76.82% in testing. en
dc.language en es
dc.subject spam classification en
dc.subject twitter en
dc.subject topic discovering en
dc.subject cost-sensitive classifier en
dc.subject random forest en
dc.title Cost-Sensitive Classifier for Spam Detection on News Media Twitter Accounts (revised April 2017) en
dc.type Objeto de conferencia es
sedici.identifier.uri http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/SLMDI/SLMDI-07.pdf es
sedici.creator.person Tur, Georvic es
sedici.creator.person Homsi, Masun Nabhan es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Sociedad Argentina de Informática e Investigación Operativa (SADIO) es
sedici.subtype Objeto de conferencia es
sedici.rights.license Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
sedici.rights.uri http://creativecommons.org/licenses/by-sa/3.0/
sedici.date.exposure 2017-09
sedici.relation.event Simposio Latinoamericano de Manejo de Datos e Información (SLMDI) - JAIIO 46 (Córdoba, 2017) es
sedici.description.peerReview peer-review es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)