background preloader

Open Language Archives Community

Open Language Archives Community
Related:  LANGUES DU MONDEDictionnaires et ressources linguistiquesCorpus Sites

USA language map… wow! | Language Trainers USA Blog Source: I stumbled upon this fascinating site recently – – the webmaster of which draws linguistic maps displaying different languages and dialects across continents. I always knew that there were plenty of different American English dialects across our country, but I didn’t expect quite so much detail on a map of the USA. Click on the image below to see the full sized version – be warned, however, the full sized .png is huge (3567×1878, 301Kb), so you might want to open it and then right click -> save it to your computer and view it that way. So here, it is, a map of indigenous languages, dialects and accents in the USA: It’s amazing to see just how many Native American languages and dialects remain around the nation.

New Corpora | Linguistic Data Consortium Chinese Discussion Forums: BOLT Chinese Discussion Forums: developed by LDC, 1,597,500 discussion forum threads in Chinese harvested from the Internet, in HTML and XML formats New Arabic Treebank release: Arabic Treebank – Weblog: developed by LDC, Arabic weblog data with part-of-speech, morphology, gloss and syntactic tree annotation, including 243,117 source tokens before clitics were split and 308,996 tree tokens after clitics were split. English and Spanish Blogs: NewSoMe Corpus of Opinion in Blogs: compiled at Barcelona Media, 108 English documents and 191 Spanish documents consisting of blogs annotated manually for opinion including topic, segment, cue, subjectivity, polarity and intensity. Multilingual Dependency Treebanks: 2006 CoNLL Shared Task - Ten Languages: dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing.

Santa Barbara Corpus of Spoken American English | Department of Linguistics - UC Santa Barbara Parts 1-4 of the Santa Barbara Corpus of Spoken American English (SBCSAE) are now available, for a total of approximately 249,000 words. The Santa Barbara Corpus includes transcriptions, audio, and timestamps which correlate transcription and audio at the level of individual intonation units. AccessDescriptionContents and Summaries CitationRecordingsAcknowledgementsContact Access All transcriptions in the Santa Barbara Corpus parts 1-4 can be dowloaded for free by clicking here. To access individual conversations and other discourse segments in the Santa Barbara Corpus, you may select the audio file and transcription you wish to download by consulting the Contents and Summaries. To download the audio files in WAV (recommended) or MP3 format, do the following: Select the transcription you want (e.g. Alternatively, you can do the following: Select a transcription (e.g. Part 1: LDC Catalog No. SBCSAE by John W. Description Contents & Summaries SBC001 Actual Blacksmithing SBC002 Lambada SBC006 Cuz

IndoEuropean Origins - GeoCurrents Mismodeling Indo-European Origin and Expansion: Bouckaert, Atkinson, Wade and the Assault on Historical Linguistics Dear Readers, As GeoCurrents passed through its August slowdown, plans were made for a series on the Summer Olympics. Thanks to the efforts of Chris Kremer, we have gathered statistics—and made maps—relating Olympic medal count by country to population and GDP, both overall and in regard to specific categories of competition. The series, however, has been put on hold by the … Quentin Atkinson’s Nonsensical Maps of Indo-European Expansion The website that accompanies “Mapping the Origins and Expansion of the Indo-European Language Family” (August 24 Science), maintained by co-author Quentin D. Why the Indo-European Debate Matters—And Matters Deeply As expected, we have received a few complaints from friends, acquaintances, and Facebook-followers in regard to the current Indo-European series. The Hazards of Formal Geographical Modeling in Bouckaert et al.

WALS Online - ELISA - English Language Interview Corpus as a Second-Language Learning Application The ELISA corpus is being developed at the University of Tuebingen (Dept of Applied English Linguistics, AEL) and the University of Surrey (Dept of Languages and Translation Studies, LTS) as a resource for language learning and teaching, and interpreter training. It contains interviews with native speakers of English. They talk about their professional career (e.g. in tourism, politics, the media or environmental education). We are very grateful to all speakers for their kind contributions. This demo website contains selected materials from the ELISA corpus more information, acknowledgements, availability and copyright). You can use our Concordancer (written in PERL) on text versions of all corpus files. Sections:

ISLRF : Institut Supérieur des Langues de la République Française Centre de RESSOURCE, lieu de RECHERCHE et organisateur de COLLOQUES pour l’immersion linguistique. La création de l'ISLRF L'Institut Supérieur des Langues de la République Française (ISLRF) est un établissement d'enseignement supérieur associatif qui a été créé en 1997 par cinq réseaux d'écoles en langue dites régionales : SKOLIOU DIWAN/ ÉCOLES DIWAN qui, en 2013-2014,scolarisent 3733 élèves de la maternelle à la terminale dans 51 établissements gérés par l'association Diwan Breizh (dont 4 écoles maternelles et élémentaires, 6 collèges et 1 lycée). ESCOLES LA BRESSOLA/ ÉCOLES LA BRESSOLA qui, en 2012-2013, scolarisent 762 élèves de la maternelle au collège dans 7 établissements gérés par l'association La Bressola (dont 6 écoles maternelles et élémentaires et 1 collège). SEASKA/ ÉCOLES IKASTOLA qui, en 2012-2013, scolarisent 3072 élèves de la maternelle à la terminale dans 31 établissements gérés par l'association SEASKA (dont 27 écoles maternelles et élémentaires, 3 collèges et 1 lycée).

The Slipnet - a dynamic semantic network The Slipnet - a dynamic semantic network The Slipnet could be envisaged as our long-term memory. It is a network of concepts (nodes) connected by conceptual relations (links). Correspondingly, links have an ever changing conceptual distance, reflecting the program's current esteem of the closeness of the two concepts connected by it. In the beginning, there only is an activation-free Slipnet. Like most semantic networks, Copycat's Slipnet has different 'classes' of links, which are sometimes used to focus on a certain relationship.

ELFA Project – University of Helsinki On this page you can find: See also: Description of the ELFA corpus project The ELFA corpus was completed in 2008 and its development work is ongoing. Altogether, the corpus contains 1 million words of transcribed spoken academic ELF (approximately 131 hours of recorded speech). The speech events in the corpus include both monologic events, such as lectures and presentations (33% of data), and dialogic/polylogic events, such as seminars, thesis defences, and conference discussions, which have been given an emphasis in the data (67%). As for the disciplinary domains, the ELFA corpus is composed of social sciences (29% of the recorded data), technology (19%), humanities (17%), natural sciences (13%), medicine (10%), behavioural sciences (7%), and economics and administration (5%). Distribution of disciplinary domains in the ELFA corpus Source: Elina Ranta; see also Mauranen, Hynninen & Ranta (2010) English as an academic lingua franca: The ELFA project. Recommended citation ELFA 2008.