Busque entre los 157043 recursos disponibles en el repositorio
Mostrar el registro sencillo del ítem
dc.date.accessioned | 2023-03-01T17:36:10Z | |
dc.date.available | 2023-03-01T17:36:10Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | http://sedici.unlp.edu.ar/handle/10915/149456 | |
dc.description.abstract | One of the main challenges in automatic email classification problems occurs when it is necessary to work with a relatively large number of classes and the classes are highly imbalanced. That happens even when non-labeled textual bases are available because manual labeling is costly. In this respect, all automatic text classification strategies –to a greater or lesser extent– are sensitive to the problems of imbalance between classes. The most widely used approaches for learning from unbalanced databases consists of resampling techniques, either by undersampling or oversampling the datasets. However, existing techniques have some problems to be solved. In this work we present a new proposal that consists of balancing the classes of the data set by retrieving unlabeled instances (e-mails) that are similar to those of the minority classes. It is shown that, for the data set used, it is a valid, viable and competitive strategy with respect to the resampling strategies currently used to learn from imbalanced email databases. | en |
dc.format.extent | 415-425 | es |
dc.language | en | es |
dc.subject | imbalanced data | es |
dc.subject | automatic classification | es |
dc.subject | information retrieval | es |
dc.title | Instance retrieval from non-labeled data as a strategy for automatic classifcation of imbalanced e-mail datasets | es |
dc.type | Objeto de conferencia | es |
sedici.identifier.isbn | 978-987-1364-31-2 | es |
sedici.creator.person | Fernández, Juan Manuel | es |
sedici.creator.person | Errecalde, Marcelo Luis | es |
sedici.description.note | XIX Workshop base de datos y Minería de datos (WBDMD) | es |
sedici.subject.materias | Ciencias Informáticas | es |
sedici.description.fulltext | true | es |
mods.originInfo.place | Red de Universidades con Carreras en Informática | es |
sedici.subtype | Objeto de conferencia | es |
sedici.rights.license | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | |
sedici.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
sedici.date.exposure | 2022-10 | |
sedici.relation.event | XXVIII Congreso Argentino de Ciencias de la Computación (CACIC) (La Rioja, 3 al 6 de octubre de 2022) | es |
sedici.description.peerReview | peer-review | es |
sedici.relation.isRelatedWith | http://sedici.unlp.edu.ar/handle/10915/149102 | es |
sedici.relation.bookTitle | Libro de actas - XXVIII Congreso Argentino de Ciencias de la Computación - CACIC 2022 | es |