background preloader

Open Language Archives Community

Open Language Archives Community
Related:  LANGUES DU MONDEDictionnaires et ressources linguistiquesCorpus Sites

USA language map… wow! | Language Trainers USA Blog Source: muturzikin.com I stumbled upon this fascinating site recently – muturzikin.com – the webmaster of which draws linguistic maps displaying different languages and dialects across continents. I always knew that there were plenty of different American English dialects across our country, but I didn’t expect quite so much detail on a map of the USA. Click on the image below to see the full sized version – be warned, however, the full sized .png is huge (3567×1878, 301Kb), so you might want to open it and then right click -> save it to your computer and view it that way. So here, it is, a map of indigenous languages, dialects and accents in the USA: It’s amazing to see just how many Native American languages and dialects remain around the nation.

LDC - Linguistic Data Consortium New Corpora | Linguistic Data Consortium Chinese Discussion Forums: BOLT Chinese Discussion Forums: developed by LDC, 1,597,500 discussion forum threads in Chinese harvested from the Internet, in HTML and XML formats New Arabic Treebank release: Arabic Treebank – Weblog: developed by LDC, Arabic weblog data with part-of-speech, morphology, gloss and syntactic tree annotation, including 243,117 source tokens before clitics were split and 308,996 tree tokens after clitics were split. English and Spanish Blogs: NewSoMe Corpus of Opinion in Blogs: compiled at Barcelona Media, 108 English documents and 191 Spanish documents consisting of blogs annotated manually for opinion including topic, segment, cue, subjectivity, polarity and intensity. Multilingual Dependency Treebanks: 2006 CoNLL Shared Task - Ten Languages: dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing.

Santa Barbara Corpus of Spoken American English | Department of Linguistics - UC Santa Barbara Parts 1-4 of the Santa Barbara Corpus of Spoken American English (SBCSAE) are now available, for a total of approximately 249,000 words. The Santa Barbara Corpus includes transcriptions, audio, and timestamps which correlate transcription and audio at the level of individual intonation units. AccessDescriptionContents and Summaries CitationRecordingsAcknowledgementsContact Access All transcriptions in the Santa Barbara Corpus parts 1-4 can be dowloaded for free by clicking here. To access individual conversations and other discourse segments in the Santa Barbara Corpus, you may select the audio file and transcription you wish to download by consulting the Contents and Summaries. To download the audio files in WAV (recommended) or MP3 format, do the following: Select the transcription you want (e.g. Alternatively, you can do the following: Select a transcription (e.g. Part 1: LDC Catalog No. SBCSAE by John W. Description Contents & Summaries SBC001 Actual Blacksmithing SBC002 Lambada SBC006 Cuz

IndoEuropean Origins - GeoCurrents Mismodeling Indo-European Origin and Expansion: Bouckaert, Atkinson, Wade and the Assault on Historical Linguistics Dear Readers, As GeoCurrents passed through its August slowdown, plans were made for a series on the Summer Olympics. Thanks to the efforts of Chris Kremer, we have gathered statistics—and made maps—relating Olympic medal count by country to population and GDP, both overall and in regard to specific categories of competition. The series, however, has been put on hold by the … Quentin Atkinson’s Nonsensical Maps of Indo-European Expansion The website that accompanies “Mapping the Origins and Expansion of the Indo-European Language Family” (August 24 Science), maintained by co-author Quentin D. Why the Indo-European Debate Matters—And Matters Deeply As expected, we have received a few complaints from friends, acquaintances, and Facebook-followers in regard to the current Indo-European series. The Hazards of Formal Geographical Modeling in Bouckaert et al.

Moses - Main/HomePage Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. All you need is a collection of translated texts (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices. Moses on Twitter News 5 October 2017 Moses v 4.0 has been released! Features Moses offers two types of translation models: phrase-based and tree-basedMoses features factored translation models, which enable the integration linguistic and other information at the word level Moses allows the decoding of confusion networks and word lattices, enabling easy integration with ambiguous upstream tools, such as automatic speech recognizers or morphological analyzers The Experiment Management System makes using Moses much easier Get started The released software includes a command line executable which can used for decoding. Acknowledgement

WALS Online - ELISA - English Language Interview Corpus as a Second-Language Learning Application The ELISA corpus is being developed at the University of Tuebingen (Dept of Applied English Linguistics, AEL) and the University of Surrey (Dept of Languages and Translation Studies, LTS) as a resource for language learning and teaching, and interpreter training. It contains interviews with native speakers of English. They talk about their professional career (e.g. in tourism, politics, the media or environmental education). We are very grateful to all speakers for their kind contributions. This demo website contains selected materials from the ELISA corpus more information, acknowledgements, availability and copyright). You can use our Concordancer (written in PERL) on text versions of all corpus files. Sections:

ISLRF : Institut Supérieur des Langues de la République Française Centre de RESSOURCE, lieu de RECHERCHE et organisateur de COLLOQUES pour l’immersion linguistique. La création de l'ISLRF L'Institut Supérieur des Langues de la République Française (ISLRF) est un établissement d'enseignement supérieur associatif qui a été créé en 1997 par cinq réseaux d'écoles en langue dites régionales : SKOLIOU DIWAN/ ÉCOLES DIWAN qui, en 2013-2014,scolarisent 3733 élèves de la maternelle à la terminale dans 51 établissements gérés par l'association Diwan Breizh (dont 4 écoles maternelles et élémentaires, 6 collèges et 1 lycée). ESCOLES LA BRESSOLA/ ÉCOLES LA BRESSOLA qui, en 2012-2013, scolarisent 762 élèves de la maternelle au collège dans 7 établissements gérés par l'association La Bressola (dont 6 écoles maternelles et élémentaires et 1 collège). SEASKA/ ÉCOLES IKASTOLA qui, en 2012-2013, scolarisent 3072 élèves de la maternelle à la terminale dans 31 établissements gérés par l'association SEASKA (dont 27 écoles maternelles et élémentaires, 3 collèges et 1 lycée).

FP7 : ICT : Language Technologies : EU-BRIDGE back to overview Please note that the project factsheets will no longer be updated. All information relevant to the project can be found on the CORDIS factsheet. This is updated on a regular basis with public deliverables, etc. EU-BRIDGE - Bridges Across the Language Divide Challenge The project aims at developing automatic transcription and translation technology that will permit the development of innovative multimedia captioning and translation services of audiovisual documents between European and non-European languages. Objectives and Innovation EU-Bridge partners aim to develop above state-of-the art speech and machine translation capabilities in view of new and more challenging business use cases. The target group of the project The prospective users of the project are European companies operating in an audiovisual market (in particular TV captioning and translation). The result Impact

Related: