Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2021-09-23T13:33:03Z
dc.date.available 2021-09-23T13:33:03Z
dc.date.issued 2021
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/125448
dc.description.abstract In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR²-BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline. en
dc.language en es
dc.subject Big data es
dc.subject Data reduction es
dc.subject Classification es
dc.subject Preprocessing techniques es
dc.subject Apache Spark es
dc.title FDR²-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems en
dc.type Articulo es
sedici.identifier.uri https://www.mdpi.com/2079-9292/10/15/1757 es
sedici.identifier.other doi:10.3390/electronics10151757 es
sedici.identifier.issn 2079-9292 es
sedici.creator.person Basgall, María es
sedici.creator.person Naiouf, Marcelo es
sedici.creator.person Fernández, Alberto es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Instituto de Investigación en Informática es
sedici.subtype Articulo es
sedici.rights.license Creative Commons Attribution 4.0 International (CC BY 4.0)
sedici.rights.uri http://creativecommons.org/licenses/by/4.0/
sedici.description.peerReview peer-review es
sedici.relation.journalTitle Electronics es
sedici.relation.journalVolumeAndIssue vol. 10, no. 15 es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution 4.0 International (CC BY 4.0) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution 4.0 International (CC BY 4.0)