background preloader

ELAN - The Language Archive

ELAN - The Language Archive


Using CLAN Warning: After installing a new version of CLAN for use with old data, you will need to get a new version of the MOR grammar and run MOR, POST, and CHECK again on your old data to make sure they work with the newer format. Alternatively, you may wish continue using old versions of CLAN with old versions of corpora. However, CHILDES data on the web are always updated to run with new versions of CLAN. For Windows:

Laurence Anthony's AntConc Older Versions All previous releases of AntConc can be found at the following link. <.exe> files are for Windows. <.zip> files are for Macintosh OS X. <.tar.gz> files are for Linux. All previous releases Development Version WebCorp: The Web as Corpus WebCorp Live lets you access the Web as a corpus - a large collection of texts from which examples of real language use can be extracted. More... Have you tried WebCorp LSE? Santa Barbara Corpus of Spoken American English Parts 1-4 of the Santa Barbara Corpus of Spoken American English (SBCSAE) are now available, for a total of approximately 249,000 words. The Santa Barbara Corpus includes transcriptions, audio, and timestamps which correlate transcription and audio at the level of individual intonation units. AccessDescriptionContents and Summaries CitationRecordingsAcknolwedgementsContact Access

ELISA - English Language Interview Corpus as a Second-Language Learning Application The ELISA corpus is being developed at the University of Tuebingen (Dept of Applied English Linguistics, AEL) and the University of Surrey (Dept of Languages and Translation Studies, LTS) as a resource for language learning and teaching, and interpreter training. It contains interviews with native speakers of English. They talk about their professional career (e.g. in tourism, politics, the media or environmental education). We are very grateful to all speakers for their kind contributions. This demo website contains selected materials from the ELISA corpus

ELFA Project – University of Helsinki On this page you can find: See also: Description of the ELFA corpus project The ELFA corpus was completed in 2008 and its development work is ongoing. Altogether, the corpus contains 1 million words of transcribed spoken academic ELF (approximately 131 hours of recorded speech). VOICE - Project - 'Lingua Franca corpus' In the early 21st century, English in the world finds itself in an “unstable equilibrium”: On the one hand, the majority of the world's English users are not native speakers of the language, but use it as an additional language, as a convenient means for communicative interactions that cannot be conducted in their mother tongues. On the other hand, linguistic descriptions have as yet predominantly been focusing on English as it is spoken and written by its native speakers. VOICE seeks to redress the balance by providing a sizeable, computer-readable corpus of English as it is spoken by this non-native speaking majority of users in different contexts.

Geoffrey Sampson: SUSANNE Scheme - Parsed Corpus Geoffrey Sampson The Need for Grammatical Taxonomy Since the 1990s, the exciting growth-area in linguistics has been corpus linguistics: studying how English and other languages are used in real life, through analysis of large electronic samples – “corpora” – of spoken or written usage. In 2004, together with my colleague Diana McCarthy I edited an anthology of papers illustrating the diverse strengths of modern corpus linguistics. Many findings of corpus linguistics shed new light on the nature of language as a human ability. But corpus analysis is crucial also for enabling computers to process human language.

WordSmith main page Windows software for finding word patterns Published by Lexical Analysis Software and Oxford University Press since 1996 Concord COLT: The Bergen Corpus Of London Teenage Language A Language Research Project at the University of Bergen Funded by the Faculty of Arts and the Norwegian Research Council In co-operation with The HIT Centre, University of Bergen (from 2001: The Department of Culture, Language and Information Technology (Aksis) Welcome to the COLT homepage! The Bergen Corpus of London Teenage Language (COLT) is the first large English Corpus focusing on the speech of teenagers. It was collected in 1993 and consists of the spoken language of 13 to 17-year-old teenagers from different boroughs of London. The complete corpus, half a million words, has been orthographically transcribed and word-class tagged, and is a constituent of the British National Corpus.

Helsinki Corpus (HC): AD850-1710 OLD ENGLISH The Helsinki Corpus of English Texts is a structured multi-genre diachronic corpus, which includes periodically organized text samples from Old, Middle and Early Modern English. Each sample is preceded by a list of parameter codes giving information on the text and its author. The Corpus is useful particularly in the study of the change of linguistic features in long diachrony. It can be used as a diagnostic corpus giving general information of the occurrence of forms, structures and lexemes in different periods of English. This information can be supplemented by evidence yielded by more special and focused historical corpora. For information on the XML version of the Helsinki Corpus, click here.