Upload resources

Upload your works to SEDICI to increase its visibility and improve its impact


Show simple item record

dc.date.accessioned 2021-04-07T13:35:07Z
dc.date.available 2021-04-07T13:35:07Z
dc.date.issued 2020
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/116420
dc.description.abstract Upon these days, there is a large number of available historical documentary collections that have not been exploited to extract information. Many efforts are being made to digitize these volumes and make them available for digital platforms. However, various obstacles appear in the task of processing their content. Due to the deterioration of documents and other factors such as the different dialects and language variants, the quality of the digitizations is usually low. By means of NLP tools it is possible to increase the quality of texts. The current proposal consists in the employment of NLP tools, particularly neural language models, for processing the output of different OCR mechanisms. Important improvements in the quality of the texts are expected, as this has been the case in many related tasks. The ultimate purpose of this work is the use of the resulting digitized texts in information retrieval (IR) and information extraction (IE) platforms. en
dc.format.extent 125-128 es
dc.language en es
dc.subject OCR post-processing es
dc.subject Neural language models es
dc.subject Information retrieval. es
dc.title Language modeling tools for massive historical OCR post-processing en
dc.type Objeto de conferencia es
sedici.identifier.uri http://49jaiio.sadio.org.ar/pdfs/agranda/AGRANDA-15.pdf es
sedici.identifier.issn 2683-8966 es
sedici.creator.person Xamena, Eduardo es
sedici.creator.person Maguitman, Ana Gabriela es
sedici.subject.materias Ciencias Informáticas es
sedici.description.fulltext true es
mods.originInfo.place Sociedad Argentina de Informática es
sedici.subtype Objeto de conferencia es
sedici.rights.license Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
sedici.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/
sedici.date.exposure 2020-10
sedici.relation.event VI Simposio Argentino de Ciencia de Datos y GRANdes DAtos (AGRANDA 2020) - JAIIO 49 (Modalidad virtual) es
sedici.description.peerReview peer-review es

Download Files

This item appears in the following Collection(s)

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Except where otherwise noted, this item's license is described as Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)