Error Detection and Correction in Annotated Corpora

Error Detection and Correction in Annotated Corpora
Author: Markus Dickinson
Publisher:
Total Pages:
Release: 2005
Genre: Computational linguistics
ISBN:


Download Error Detection and Correction in Annotated Corpora Book in PDF, Epub and Kindle

Abstract: Building on work showing the harmfulness of annotation errors for both the training and evaluation of natural language processing technologies, this thesis develops a method for detecting and correcting errors in corpora with linguistic annotation. The so-called variation n-gram method relies on the recurrence of identical strings with varying annotation to find erroneous mark-up. We show that the method is applicable for varying complexities of annotation. The method is most readily applied to positional annotation, such as part-of-speech annotation, but can be extended to structural annotation, both for tree structures---as with syntactic annotation---and for graph structures---as with syntactic annotation allowing discontinuous constituents, or crossing branches. Furthermore, we demonstrate that the notion of variation for detecting errors is a powerful one, by searching for grammar rules in a treebank which have the same daughters but different mothers. We also show that such errors impact the effectiveness of a grammar induction algorithm and subsequent parsing. After detecting errors in the different corpora, we turn to correcting such errors, through the use of more general classification techniques. Our results indicate that the particular classification algorithm is less important than understanding the nature of the errors and altering the classifiers to deal with these errors. With such alterations, we can automatically correct errors with 85% accuracy. By sorting the errors, we can relegate over 20% of them into an automatically correctable class and speed up the re-annotation process by effectively categorizing the others.

Automated Grammatical Error Detection for Language Learners, Second Edition

Automated Grammatical Error Detection for Language Learners, Second Edition
Author: Claudia Leacock
Publisher: Springer Nature
Total Pages: 154
Release: 2022-06-01
Genre: Computers
ISBN: 3031021533


Download Automated Grammatical Error Detection for Language Learners, Second Edition Book in PDF, Epub and Kindle

It has been estimated that over a billion people are using or learning English as a second or foreign language, and the numbers are growing not only for English but for other languages as well. These language learners provide a burgeoning market for tools that help identify and correct learners' writing errors. Unfortunately, the errors targeted by typical commercial proofreading tools do not include those aspects of a second language that are hardest to learn. This volume describes the types of constructions English language learners find most difficult: constructions containing prepositions, articles, and collocations. It provides an overview of the automated approaches that have been developed to identify and correct these and other classes of learner errors in a number of languages. Error annotation and system evaluation are particularly important topics in grammatical error detection because there are no commonly accepted standards. Chapters in the book describe the options available to researchers, recommend best practices for reporting results, and present annotation and evaluation schemes. The final chapters explore recent innovative work that opens new directions for research. It is the authors' hope that this volume will continue to contribute to the growing interest in grammatical error detection by encouraging researchers to take a closer look at the field and its many challenging problems.

Automated Grammatical Error Detection for Language Learners

Automated Grammatical Error Detection for Language Learners
Author: Claudia Leacock
Publisher: Springer Nature
Total Pages: 127
Release: 2010-05-11
Genre: Computers
ISBN: 3031021371


Download Automated Grammatical Error Detection for Language Learners Book in PDF, Epub and Kindle

It has been estimated that over a billion people are using or learning English as a second or foreign language, and the numbers are growing not only for English but for other languages as well. These language learners provide a burgeoning market for tools that help identify and correct learners' writing errors. Unfortunately, the errors targeted by typical commercial proofreading tools do not include those aspects of a second language that are hardest to learn. This volume describes the types of constructions English language learners find most difficult -- constructions containing prepositions, articles, and collocations. It provides an overview of the automated approaches that have been developed to identify and correct these and other classes of learner errors in a number of languages. Error annotation and system evaluation are particularly important topics in grammatical error detection because there are no commonly accepted standards. Chapters in the book describe the options available to researchers, recommend best practices for reporting results, and present annotation and evaluation schemes. The final chapters explore recent innovative work that opens new directions for research. It is the authors' hope that this volume will contribute to the growing interest in grammatical error detection by encouraging researchers to take a closer look at the field and its many challenging problems. Table of Contents: Introduction / History of Automated Grammatical Error Detection / Special Problems of Language Learners / Language Learner Data / Evaluating Error Detection Systems / Article and Preposition Errors / Collocation Errors / Different Approaches for Different Errors / Annotating Learner Errors / New Directions / Conclusion

Computational Methods for Corpus Annotation and Analysis

Computational Methods for Corpus Annotation and Analysis
Author: Xiaofei Lu
Publisher: Springer
Total Pages: 192
Release: 2014-07-08
Genre: Language Arts & Disciplines
ISBN: 9401786453


Download Computational Methods for Corpus Annotation and Analysis Book in PDF, Epub and Kindle

In the past few decades the use of increasingly large text corpora has grown rapidly in language and linguistics research. This was enabled by remarkable strides in natural language processing (NLP) technology, technology that enables computers to automatically and efficiently process, annotate and analyze large amounts of spoken and written text in linguistically and/or pragmatically meaningful ways. It has become more desirable than ever before for language and linguistics researchers who use corpora in their research to gain an adequate understanding of the relevant NLP technology to take full advantage of its capabilities. This volume provides language and linguistics researchers with an accessible introduction to the state-of-the-art NLP technology that facilitates automatic annotation and analysis of large text corpora at both shallow and deep linguistic levels. The book covers a wide range of computational tools for lexical, syntactic, semantic, pragmatic and discourse analysis, together with detailed instructions on how to obtain, install and use each tool in different operating systems and platforms. The book illustrates how NLP technology has been applied in recent corpus-based language studies and suggests effective ways to better integrate such technology in future corpus linguistics research. This book provides language and linguistics researchers with a valuable reference for corpus annotation and analysis.

Corpus Annotation

Corpus Annotation
Author: Roger Garside
Publisher: Routledge
Total Pages: 304
Release: 1997
Genre: Computers
ISBN:


Download Corpus Annotation Book in PDF, Epub and Kindle

This is a text which surveys the growing field of research known as corpus annotation - an electronic collection of texts. Corpus annotation is a central resource in linguisticsi̧nformation technology and the processing of human language. The book seeks to show the nature of language and the most effective means of analysing it. A bibliography lists relevant e-mail addresses and Web sites.

Automated Grammatical Error Detection for Language Learners

Automated Grammatical Error Detection for Language Learners
Author: Claudia Leacock
Publisher: Morgan & Claypool Publishers
Total Pages: 172
Release: 2014-02-01
Genre: Computers
ISBN: 1627050140


Download Automated Grammatical Error Detection for Language Learners Book in PDF, Epub and Kindle

It has been estimated that over a billion people are using or learning English as a second or foreign language, and the numbers are growing not only for English but for other languages as well. These language learners provide a burgeoning market for tools that help identify and correct learners' writing errors. Unfortunately, the errors targeted by typical commercial proofreading tools do not include those aspects of a second language that are hardest to learn. This volume describes the types of constructions English language learners find most difficult: constructions containing prepositions, articles, and collocations. It provides an overview of the automated approaches that have been developed to identify and correct these and other classes of learner errors in a number of languages. Error annotation and system evaluation are particularly important topics in grammatical error detection because there are no commonly accepted standards. Chapters in the book describe the options available to researchers, recommend best practices for reporting results, and present annotation and evaluation schemes. The final chapters explore recent innovative work that opens new directions for research. It is the authors' hope that this volume will continue to contribute to the growing interest in grammatical error detection by encouraging researchers to take a closer look at the field and its many challenging problems.

Handbook of Linguistic Annotation

Handbook of Linguistic Annotation
Author: Nancy Ide
Publisher: Springer
Total Pages: 1440
Release: 2017-06-16
Genre: Language Arts & Disciplines
ISBN: 9402408819


Download Handbook of Linguistic Annotation Book in PDF, Epub and Kindle

This handbook offers a thorough treatment of the science of linguistic annotation. Leaders in the field guide the reader through the process of modeling, creating an annotation language, building a corpus and evaluating it for correctness. Essential reading for both computer scientists and linguistic researchers.Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy. The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.

The Cambridge Handbook of Learner Corpus Research

The Cambridge Handbook of Learner Corpus Research
Author: Sylviane Granger
Publisher: Cambridge University Press
Total Pages: 1199
Release: 2015-10-01
Genre: Language Arts & Disciplines
ISBN: 1316432149


Download The Cambridge Handbook of Learner Corpus Research Book in PDF, Epub and Kindle

The origins of learner corpus research go back to the late 1980s when large electronic collections of written or spoken data started to be collected from foreign/second language learners, with a view to advancing our understanding of the mechanisms of second language acquisition and developing tailor-made pedagogical tools. Engaging with the interdisciplinary nature of this fast-growing field, The Cambridge Handbook of Learner Corpus Research explores the diverse and extensive applications of learner corpora, with 27 chapters written by internationally renowned experts. This comprehensive work is a vital resource for students, teachers and researchers, offering fresh perspectives and a unique overview of the field. With representative studies in each chapter which provide an essential guide on how to conduct learner corpus research in a wide range of areas, this work is a cutting-edge account of learner corpus collection, annotation, methodology, theory, analysis and applications.

Automatic Treatment and Analysis of Learner Corpus Data

Automatic Treatment and Analysis of Learner Corpus Data
Author: Ana Díaz-Negrillo
Publisher: John Benjamins Publishing Company
Total Pages: 322
Release: 2013-12-15
Genre: Language Arts & Disciplines
ISBN: 9027270953


Download Automatic Treatment and Analysis of Learner Corpus Data Book in PDF, Epub and Kindle

This book is a critical appraisal of recent developments in corpus linguistics for the analysis of written and spoken learner data. The twelve papers cover an introductory critical appraisal of learner corpus data compilation and development (section 1); issues in data compilation, annotation and exchangeability (section 2); automatic approaches to data identification and analysis (section 3); and analysis of learner corpus data in the light of recent models of data analysis and interpretation, especially recent automatic approaches for the identification of learner language features (section 4). This collection is aimed at students and researchers of corpus linguistics, second language acquisition studies and quantitative linguistics. It will significantly advance learner corpus research in terms of methodological innovation and will fill in an important gap in the development of multidisciplinary approaches (for learner corpus studies).