Busque entre los 170597 recursos disponibles en el repositorio
Mostrar el registro sencillo del ítem
dc.date.accessioned | 2022-02-02T17:59:55Z | |
dc.date.available | 2022-02-02T17:59:55Z | |
dc.date.issued | 2021 | |
dc.identifier.uri | http://sedici.unlp.edu.ar/handle/10915/130348 | |
dc.description.abstract | Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an acceptable prediction. Currently, a large amount of data is being collected and, as a result, databases are becoming increasingly larger and distributed processing becomes a necessity. In this sense, Spark, and in particular its Spark ML library, is one of the most widely used frameworks for performing classification tasks in large databases. Given that some feature extraction techniques need to execute a classification algorithm a significant number of times, with a different subset of attributes in each run, the performance of these algorithms should be known beforehand so that the overall feature extraction process is carried out in the shortest possible time. In this work, we carry out a comparative study of four Spark ML classification algorithms, measuring predictive power and execution times as a function of the number of attributes in the training dataset. | en |
dc.format.extent | 311-320 | es |
dc.language | en | es |
dc.subject | Big Data | es |
dc.subject | Machine learning | es |
dc.subject | Classification Models | es |
dc.subject | Apache Spark | es |
dc.subject | Spark ML | es |
dc.title | Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library | en |
dc.type | Objeto de conferencia | es |
sedici.identifier.isbn | 978-987-633-574-4 | es |
sedici.creator.person | Camele, Genaro | es |
sedici.creator.person | Hasperué, Waldo | es |
sedici.creator.person | Ronchetti, Franco | es |
sedici.creator.person | Quiroga, Facundo Manuel | es |
sedici.description.note | Workshop: WBDMD - Base de Datos y Minería de Datos | es |
sedici.subject.materias | Ciencias Informáticas | es |
sedici.description.fulltext | true | es |
mods.originInfo.place | Red de Universidades con Carreras en Informática | es |
sedici.subtype | Objeto de conferencia | es |
sedici.rights.license | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
sedici.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
sedici.date.exposure | 2021-10 | |
sedici.relation.event | XXVII Congreso Argentino de Ciencias de la Computación (CACIC) (Modalidad virtual, 4 al 8 de octubre de 2021) | es |
sedici.description.peerReview | peer-review | es |
sedici.relation.isRelatedWith | http://sedici.unlp.edu.ar/handle/10915/129809 | es |
sedici.relation.bookTitle | Memorias del Congreso Argentino en Ciencias de la Computación - CACIC 2021 | es |