Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2018-10-04T18:55:32Z
dc.date.available 2018-10-04T18:55:32Z
dc.date.issued 2018
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/69676
dc.description.abstract The volume of data in today’s applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation. en
dc.format.extent 23-28 es
dc.language en es
dc.subject big data es
dc.subject big data, imbalanced classification, preprocessing, SMOTE, spark en
dc.title SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data en
dc.type Objeto de conferencia es
sedici.identifier.isbn 978-950-34-1659-4 es
sedici.creator.person Basgall, María José es
sedici.creator.person Hasperué, Waldo es
sedici.creator.person Naiouf, Marcelo es
sedici.creator.person Fernández, Alberto es
sedici.creator.person Herrera, Francisco es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Facultad de Informática es
sedici.subtype Objeto de conferencia es
sedici.rights.license Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
sedici.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
sedici.date.exposure 2018-06
sedici.relation.event VI Jornadas de Cloud Computing & Big Data (JCC&BD) (La Plata, 2018) es
sedici.description.peerReview peer-review es
sedici.relation.isRelatedWith http://sedici.unlp.edu.ar/handle/10915/69464 es
sedici.relation.isRelatedWith http://sedici.unlp.edu.ar/handle/10915/71652 es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)