Learning the costs for a string edit distance-based similarity measure for abbreviated language

Alonso i Alemany, Laura

Buscar material

Busque entre los 171119 recursos disponibles en el repositorio

Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

Learning the costs for a string edit distance-based similarity measure for abbreviated language

Autor: Alonso i Alemany, Laura

2010

Tipo de documento: Objeto de conferencia

Resumen

We present work in progress on word normalization for user-generated content. The approach is simple and helps in reducing the amount of manual annotation characteristic of more classical approaches. First, ortographic variants of a word, mostly abbreviations, are grouped together. From these manually grouped examples, we learn an automated classifier that, given a previously unseen word, determines whether it is an ortographic variant of a known word or an entirely new word. To do that, we calculate the similarity between the unseen word and all known words, and classify the new word as an ortographic variant of its most similar word. The classifier applies a string similarity measure based on the Levenshtein edit distance. To improve the accuracy of this measure, we assign edit operations an error-based cost. This scheme of cost assigning aims to maximize the distance between similar strings that are variants of different words. This custom similarity measure achieves an accuracy of .68, an important improvement if we compare it with the .54 obtained by the Levenshtein distance.

Información general

Fecha de exposición: 2010

Fecha de publicación: 2010

Idioma del documento: Inglés

Evento: Simposio Argentino de Inteligencia Artificial (ASAI 2010) - JAIIO 39 (UADE, 30 de agosto al 3 de septiembre de 2010)

Institución de origen: Sociedad Argentina de Informática e Investigación Operativa

ISSN: 1850-2784

Páginas: 72-81

Palabras claves: Natural Language Processing ; String Edit Distances

Materias: Ciencias Informáticas

Descargar archivos

Documento completo
Descargar archivo (215.0Kb) - PDF

Enlace externo

39jaiio.sadio.org.ar/...

BASE

GoogleScholar

Creado el: 8 de mayo de 2023

Disponible en SEDICI desde: 8 de mayo de 2023

Por favor, utilice uno de estos identificadores(URI) para citar o enlazar este ítem:

http://sedici.unlp.edu.ar/handle/10915/152590

Mostrar el registro completo del ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

39 Jornadas Argentinas de Informática e Investigación Operativa (JAIIO) → Simposio Argentino de Inteligencia Artificial (ASAI 2010)

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Iniciar sesión