Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library

Camele, Genaro; Hasperué, Waldo; Ronchetti, Franco; Quiroga, Facundo Manuel

Buscar material

Busque entre los 170597 recursos disponibles en el repositorio

Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

Red de Universidades con Carreras en Informática (RedUNCI)
→
Eventos
→
CACIC
→
CACIC 2021

Mostrar el registro sencillo del ítem

dc.date.accessioned	2022-02-02T17:59:55Z
dc.date.available	2022-02-02T17:59:55Z
dc.date.issued	2021
dc.identifier.uri	http://sedici.unlp.edu.ar/handle/10915/130348
dc.description.abstract	Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an acceptable prediction. Currently, a large amount of data is being collected and, as a result, databases are becoming increasingly larger and distributed processing becomes a necessity. In this sense, Spark, and in particular its Spark ML library, is one of the most widely used frameworks for performing classification tasks in large databases. Given that some feature extraction techniques need to execute a classification algorithm a significant number of times, with a different subset of attributes in each run, the performance of these algorithms should be known beforehand so that the overall feature extraction process is carried out in the shortest possible time. In this work, we carry out a comparative study of four Spark ML classification algorithms, measuring predictive power and execution times as a function of the number of attributes in the training dataset.	en
dc.format.extent	311-320	es
dc.language	en	es
dc.subject	Big Data	es
dc.subject	Machine learning	es
dc.subject	Classification Models	es
dc.subject	Apache Spark	es
dc.subject	Spark ML	es
dc.title	Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library	en
dc.type	Objeto de conferencia	es
sedici.identifier.isbn	978-987-633-574-4	es
sedici.creator.person	Camele, Genaro	es
sedici.creator.person	Hasperué, Waldo	es
sedici.creator.person	Ronchetti, Franco	es
sedici.creator.person	Quiroga, Facundo Manuel	es
sedici.description.note	Workshop: WBDMD - Base de Datos y Minería de Datos	es
sedici.subject.materias	Ciencias Informáticas	es
sedici.description.fulltext	true	es
mods.originInfo.place	Red de Universidades con Carreras en Informática	es
sedici.subtype	Objeto de conferencia	es
sedici.rights.license	Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
sedici.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
sedici.date.exposure	2021-10
sedici.relation.event	XXVII Congreso Argentino de Ciencias de la Computación (CACIC) (Modalidad virtual, 4 al 8 de octubre de 2021)	es
sedici.description.peerReview	peer-review	es
sedici.relation.isRelatedWith	http://sedici.unlp.edu.ar/handle/10915/129809	es
sedici.relation.bookTitle	Memorias del Congreso Argentino en Ciencias de la Computación - CACIC 2021	es

Descargar archivos

Documento completo
Descargar archivo (238.1Kb) - PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Iniciar sesión