Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2022-02-02T17:59:55Z
dc.date.available 2022-02-02T17:59:55Z
dc.date.issued 2021
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/130348
dc.description.abstract Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an acceptable prediction. Currently, a large amount of data is being collected and, as a result, databases are becoming increasingly larger and distributed processing becomes a necessity. In this sense, Spark, and in particular its Spark ML library, is one of the most widely used frameworks for performing classification tasks in large databases. Given that some feature extraction techniques need to execute a classification algorithm a significant number of times, with a different subset of attributes in each run, the performance of these algorithms should be known beforehand so that the overall feature extraction process is carried out in the shortest possible time. In this work, we carry out a comparative study of four Spark ML classification algorithms, measuring predictive power and execution times as a function of the number of attributes in the training dataset. en
dc.format.extent 311-320 es
dc.language en es
dc.subject Big Data es
dc.subject Machine learning es
dc.subject Classification Models es
dc.subject Apache Spark es
dc.subject Spark ML es
dc.title Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library en
dc.type Objeto de conferencia es
sedici.identifier.isbn 978-987-633-574-4 es
sedici.creator.person Camele, Genaro es
sedici.creator.person Hasperué, Waldo es
sedici.creator.person Ronchetti, Franco es
sedici.creator.person Quiroga, Facundo Manuel es
sedici.description.note Workshop: WBDMD - Base de Datos y Minería de Datos es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Red de Universidades con Carreras en Informática es
sedici.subtype Objeto de conferencia es
sedici.rights.license Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
sedici.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
sedici.date.exposure 2021-10
sedici.relation.event XXVII Congreso Argentino de Ciencias de la Computación (CACIC) (Modalidad virtual, 4 al 8 de octubre de 2021) es
sedici.description.peerReview peer-review es
sedici.relation.isRelatedWith http://sedici.unlp.edu.ar/handle/10915/129809 es
sedici.relation.bookTitle Memorias del Congreso Argentino en Ciencias de la Computación - CACIC 2021 es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)