Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2014-11-04T21:03:01Z
dc.date.available 2014-11-04T21:03:01Z
dc.date.issued 2014
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/42290
dc.description.abstract Author Profiling is the task of predicting characteristics of the author of a text, such as age, gender, personality, native language, etc. This is a task of growing importance due to its potential applications in security, crime and marketing, among others. One of the main difficulties in this field is the lack of reliable text collections (corpora) to train and test automatically derived classifiers, in particular in specific languages such as Spanish. Although some recent data sets were generated for the PAN competitions, these documents have a lot of “noise” that prevent researchers from obtaining more general conclusions about this task when more formal documents are used. In this context, this work proposes and describes SpanText, a data collection of formal texts in Spanish language which is, as far as we know, the first collection with these characteristics for the author profiling task. Besides, an experimental study is carried out where the difference in performance obtained with formal and informal texts is clearly established and opens interesting research lines to get a deeper understanding of the particularities that each type of documents poses to the author profiling task. en
dc.language en es
dc.subject author profiling en
dc.subject natural processing language en
dc.subject Spanish text corpus en
dc.title A Spanish text corpus for the author profiling task en
dc.type Objeto de conferencia es
sedici.creator.person Villegas, María Paula es
sedici.creator.person Garciarena Ucelay, María José es
sedici.creator.person Errecalde, Marcelo Luis es
sedici.creator.person Cagnina, Leticia es
sedici.description.note XI Workshop Bases de Datos y Minería de Datos es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Red de Universidades con Carreras de Informática (RedUNCI) es
sedici.subtype Objeto de conferencia es
sedici.rights.license Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
sedici.rights.uri http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
sedici.date.exposure 2014-10
sedici.relation.event XX Congreso Argentino de Ciencias de la Computación (Buenos Aires, 2014) es
sedici.description.peerReview peer-review es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)