Extraction of geographic entities from biological textual sources

Acuña-Chaves, Moises A.; Araya-Monge, José E.

Buscar material

Busque entre los 168474 recursos disponibles en el repositorio

Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

Extraction of geographic entities from biological textual sources

Autores: Acuña-Chaves, Moises A. | Araya-Monge, José E.

2017

Tipo de documento: Objeto de conferencia

Resumen

This work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies must be combined to achieve such objective, among them is Natural Language Processing (NLP) that helps in the extraction of entities with the usage of gazetteers. Another technology is the usage of rules (regular expressions, Deterministic Automata, context-free grammars). Additional to the identification and codification, an algorithm to bind the place names extracted to authorized sources such as gazetteer is presented. This algorithm identifies and enriches the entry text with extra information, extracted from the paragraphs where the distribution is defined in a semi unstructured text. The values of interest for this work are: world and Costa Rica distribution. After those values are identified, the information can be processed and become useful for diverse applications, such as geographic information systems. Other research projects might be interested in the results of this project. The evaluation consists in manually judging randomly selected sample of the results to establish if the algorithm yields useful data. The judgment features the evaluation of the world and Costa Rica distribution using the source context, given 3 possible values: GOOD, BAD, UNKNOWN. The ideal is to have the least BAD percentage. The algorithm is relatively good to geo-code and bind the world distribution. More work needs to be done for the Costa Rica distribution.

Información general

Fecha de exposición: septiembre 2017

Fecha de publicación: 2017

Idioma del documento: Inglés

Evento: Simposio Latinoamericano de Manejo de Datos e Información (SLMDI) - JAIIO 46 (Córdoba, 2017)

Institución de origen: Sociedad Argentina de Informática e Investigación Operativa (SADIO)

Palabras claves: técnicas de extracción ; Procesamiento de Lenguaje Natural

Materias: Ciencias Informáticas

Descargar archivos

Documento completo
Descargar archivo (494.8Kb) - PDF

Enlace externo

www.clei2017-46jaiio.sadio.org.ar/...

BASE

GoogleScholar

Creado el: 30 de octubre de 2017

Disponible en SEDICI desde: 30 de octubre de 2017

Por favor, utilice uno de estos identificadores(URI) para citar o enlazar este ítem:

http://sedici.unlp.edu.ar/handle/10915/63263

Mostrar el registro completo del ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

46 Jornadas Argentinas de Informática e Investigación Operativa (JAIIO) y 43 CLEI → III Simposio Argentino de GRANdes DAtos (AGRANDA-JAIIO)-Simposio Latinoamericano de Manejo de Datos e Información (SLMDI-CLEI)

Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)

Iniciar sesión