Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2023-03-01T17:36:10Z
dc.date.available 2023-03-01T17:36:10Z
dc.date.issued 2023
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/149456
dc.description.abstract One of the main challenges in automatic email classification problems occurs when it is necessary to work with a relatively large number of classes and the classes are highly imbalanced. That happens even when non-labeled textual bases are available because manual labeling is costly. In this respect, all automatic text classification strategies –to a greater or lesser extent– are sensitive to the problems of imbalance between classes. The most widely used approaches for learning from unbalanced databases consists of resampling techniques, either by undersampling or oversampling the datasets. However, existing techniques have some problems to be solved. In this work we present a new proposal that consists of balancing the classes of the data set by retrieving unlabeled instances (e-mails) that are similar to those of the minority classes. It is shown that, for the data set used, it is a valid, viable and competitive strategy with respect to the resampling strategies currently used to learn from imbalanced email databases. en
dc.format.extent 415-425 es
dc.language en es
dc.subject imbalanced data es
dc.subject automatic classification es
dc.subject information retrieval es
dc.title Instance retrieval from non-labeled data as a strategy for automatic classifcation of imbalanced e-mail datasets es
dc.type Objeto de conferencia es
sedici.identifier.isbn 978-987-1364-31-2 es
sedici.creator.person Fernández, Juan Manuel es
sedici.creator.person Errecalde, Marcelo Luis es
sedici.description.note XIX Workshop base de datos y Minería de datos (WBDMD) es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Red de Universidades con Carreras en Informática es
sedici.subtype Objeto de conferencia es
sedici.rights.license Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
sedici.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
sedici.date.exposure 2022-10
sedici.relation.event XXVIII Congreso Argentino de Ciencias de la Computación (CACIC) (La Rioja, 3 al 6 de octubre de 2022) es
sedici.description.peerReview peer-review es
sedici.relation.isRelatedWith http://sedici.unlp.edu.ar/handle/10915/149102 es
sedici.relation.bookTitle Libro de actas - XXVIII Congreso Argentino de Ciencias de la Computación - CACIC 2022 es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)