background preloader

Corpus of Historical American English (COHA)

Corpus of Historical American English (COHA)
Related:  Corpus Sites

Acronym Finder Shakespeare corpus The source texts came from Online Library of Liberty ( Their original source is the OUP edition of 1916. You get 37 plays, plus all the speeches of all the characters. Ie. you get the whole play Hamlet, plus separately all the speeches of Prince Hamlet, all the speeches of Horatio, etc. There is also a list of the plays and their dates. All the files are saved in 16-bit Unicode. The plays are in the root of 3 folders (comedies, historical, tragedies) as appropriate. Mike Scott mike (at)

ELFA Project – University of Helsinki On this page you can find: See also: Description of the ELFA corpus project The ELFA corpus was completed in 2008 and its development work is ongoing. The speech events in the corpus include both monologic events, such as lectures and presentations (33% of data), and dialogic/polylogic events, such as seminars, thesis defences, and conference discussions, which have been given an emphasis in the data (67%). As for the disciplinary domains, the ELFA corpus is composed of social sciences (29% of the recorded data), technology (19%), humanities (17%), natural sciences (13%), medicine (10%), behavioural sciences (7%), and economics and administration (5%). Distribution of disciplinary domains in the ELFA corpus Source: Elina Ranta; see also Mauranen, Hynninen & Ranta (2010) English as an academic lingua franca: The ELFA project. As a general principle, all data in the corpus is authentic in the sense that it is not elicited for research purposes but occurs naturally. Recommended citation

Online Etymology Dictionary English to French, Italian, German & Spanish Dictionary BASE (British Academic Spoken English) and BASE Plus Collections Overview of BASE The British Academic Spoken English (BASE) project took place at the Universities of Warwick and Reading between 2000–2005, under the directorship of Hilary Nesi (Warwick) , with Paul Thompson (Reading). Natalie Snodgrass and Sarah Creer were employed as research assistants and Tim Kelly was video producer of the project. Lou Burnard (Oxford University) and Adam Kilgarriff (Lexicography MasterClass Ltd) acted as consultants. The BASE Corpus consists of 160 lectures and 40 seminars recorded in a variety of departments (video-recorded at the University of Warwick and audio-recorded at the University of Reading). It contains 1,644,942 tokens in total (lectures and seminars). The corpus has been deposited in the Oxford Text Archive and is catalogued by the Arts and Humanities Data Service. Funding Overview of BASE Plus BASE Plus is a larger collection of British Academic Spoken English data held at the Centre for Applied Linguistics. i. ii. iii. iv. v.

ELISA - English Language Interview Corpus as a Second-Language Learning Application The ELISA corpus is being developed at the University of Tuebingen (Dept of Applied English Linguistics, AEL) and the University of Surrey (Dept of Languages and Translation Studies, LTS) as a resource for language learning and teaching, and interpreter training. It contains interviews with native speakers of English. They talk about their professional career (e.g. in tourism, politics, the media or environmental education). We are very grateful to all speakers for their kind contributions. more information, acknowledgements, availability and copyright). You can use our Concordancer (written in PERL) on text versions of all corpus files. Sections:

Diccionario de la lengua espa?ola - Vig?sima segunda edici?n ¿Qué es una cookie? Una cookie es un fichero que se descarga en su ordenador o el dispositivo que utilice (smartphone, tableta, televisión conectada…) al acceder a determinadas páginas web o aplicaciones. Las cookies permiten, entre otras cosas, recopilar información estadística, facilitar ciertas funcionalidades técnicas y almacenar y recuperar información sobre los hábitos de navegación o preferencias de un usuario o de su equipo. Además, dependiendo de la información que contengan y de la forma en que utilice su equipo, pueden utilizarse para reconocer al usuario. Una cookie se almacena en un ordenador con el fin de identificar al navegador mientras interacciona con nuestras webs o aplicaciones. Puede acceder a más información sobre las cookies a través del siguiente enlace: Configuración de cookies y revocación del consentimiento

Corpora, Collections, Data Archives 1. British National Corpus (BNC) [100m wds; 1990s British English, spoken & written]: There are many different web sites giving free (but limited) access to the corpus--limited due to copyright: i.e. you cannot expand the concordance context to read more of the surrounding text, & you cannot read the entire source texts (only snippets). BNCweb: User-friendly, free interface (limited features, if no paid licence). JustTheWord: The most accessible site for non-English-speaking background students (& most pedagogically useful) because it straightaway gives you a list of collocations for your search word/phrase, instead of concordances; results are categorized by POS-based patterns & by approximate sense clusters, & graph bars give an indication of how common each combination is. Results are based on a 80K-word subset of the BNC. 2. · Corpus of Contemporary American English (COCA): [450 m wds; 20 m wds of American Eng each year from 1990-2012.] 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.