background preloader

☢️ Corpus

Facebook Twitter

Corpus/Corpora

⊿ Point. {R} Glossary. ◢ Keyword: C. ◥ University. {q} PhD. {tr} Training. ⚫ UK. ↂ EndNote. ☝️ [BS] Heigham. Text corpus. From Wikipedia, the free encyclopedia Digital collections of natural language data In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. Overview[edit] A corpus may contain texts in a single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Applications[edit] Corpora are the main knowledge base in corpus linguistics. Some notable text corpora[edit] See also[edit] References[edit] External links[edit]