Search among the 162122 resources available in the repository
dc.date.accessioned | 2021-04-07T13:35:07Z | |
dc.date.available | 2021-04-07T13:35:07Z | |
dc.date.issued | 2020 | |
dc.identifier.uri | http://sedici.unlp.edu.ar/handle/10915/116420 | |
dc.description.abstract | Upon these days, there is a large number of available historical documentary collections that have not been exploited to extract information. Many efforts are being made to digitize these volumes and make them available for digital platforms. However, various obstacles appear in the task of processing their content. Due to the deterioration of documents and other factors such as the different dialects and language variants, the quality of the digitizations is usually low. By means of NLP tools it is possible to increase the quality of texts. The current proposal consists in the employment of NLP tools, particularly neural language models, for processing the output of different OCR mechanisms. Important improvements in the quality of the texts are expected, as this has been the case in many related tasks. The ultimate purpose of this work is the use of the resulting digitized texts in information retrieval (IR) and information extraction (IE) platforms. | en |
dc.format.extent | 125-128 | es |
dc.language | en | es |
dc.subject | OCR post-processing | es |
dc.subject | Neural language models | es |
dc.subject | Information retrieval. | es |
dc.title | Language modeling tools for massive historical OCR post-processing | en |
dc.type | Objeto de conferencia | es |
sedici.identifier.uri | http://49jaiio.sadio.org.ar/pdfs/agranda/AGRANDA-15.pdf | es |
sedici.identifier.issn | 2683-8966 | es |
sedici.creator.person | Xamena, Eduardo | es |
sedici.creator.person | Maguitman, Ana Gabriela | es |
sedici.subject.materias | Ciencias Informáticas | es |
sedici.description.fulltext | true | es |
mods.originInfo.place | Sociedad Argentina de Informática | es |
sedici.subtype | Objeto de conferencia | es |
sedici.rights.license | Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) | |
sedici.rights.uri | http://creativecommons.org/licenses/by-nc-sa/3.0/ | |
sedici.date.exposure | 2020-10 | |
sedici.relation.event | VI Simposio Argentino de Ciencia de Datos y GRANdes DAtos (AGRANDA 2020) - JAIIO 49 (Modalidad virtual) | es |
sedici.description.peerReview | peer-review | es |