Text analysis

TwitterFacebook
Get flash to fully experience Pearltrees
Perseus Annis environment enables syntactical searches in annotated Greek and Latin texts. However, the query syntax is neither simple nor self-evident. For the patient (or impatient?) http://www.ffzg.unizg.hr/klafil/dokuwiki/doku.php/z:perseus-annis

z:perseus-annis [Klafil]

What is ANNIS? ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts" . Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more.

Overview - ANNIS2 - a Linguistic Database for Exploring Information Structure

http://www.sfb632.uni-potsdam.de/annis/overtab.html
http://193.206.200.48:8080/pedecerto/home.jsp

Pede certo

Pede certo propone sei diversi sistemi di interrogazione per ottenere una selezione di versi dal corpus . In tutti i casi i versi sono presentati con le notazioni metriche (segni di quantità, cesure principali, dieresi bucolica, sinalefe, iato). Elenchi HOME > RICERCHE > ELENCHI La pagina offre una lista di selezioni predefinite secondo varie caratteristiche dei versi , tra loro combinabili (schemi esametri, schemi pentametri, spondiaci, ipermetri ecc.). Guarda l’esempio Cerca forma HOME > RICERCHE > CERCA FORMA
http://hnk.ffzg.hr/hobs/default_en.html

Croatian Dependency Treebank: homepage

Croatian Dependency Treebank is one of tasks of the project 0130418 "Development of Croatian Language Resources" supported by the Ministry of Science, Education and Sports of the Republic of Croatia. goal To build a syntactically annotated Croatian corpus of at least 100,000 tokens. method Annotation will be based on dependency analysis of sentence from the corpus. Model of syntactic description and annotation is being taken from the Prague Dependency Treebank .

GOLDVARB 2001 Users' Manual

http://courses.essex.ac.uk/lg/lg654/GoldVarb2001forPCmanual.htm We are sorry but we are unable to find the page that you are looking for. It could be that the URL you used is no longer available, or has been changed. Possible solutions The document address (URL) may have been mis-typed, so look for possible spelling errors. Try switching the file extension (ie change .htm to .html, .asp, or aspx; and vice versa).

Goldvarb X

References Sankoff, David & Rousseau, Pascale (1979). Categorical contexts and variable rules. http://individual.utoronto.ca/tagliamonte/goldvarb.htm
http://www.ldc.upenn.edu/annotation/ This page describes tools and formats for creating and managing linguistic annotations . `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, "named entity" identification, co-reference annotation, and so on. The focus is on tools which have been widely used for constructing annotated linguistic databases, and on the formats commonly adopted by such tools and databases. This page began as a set of links to systems for speech annotation, and the coverage of textual annotation is still inadequate.

Linguistic Annotation

http://www.dlib.org/dlib/may05/rydberg-cox/05rydberg-cox.html 1. Introduction For the past three years, the Cultural Heritage Language Technologies consortium [ 1 ] – situated at eight institutions in four countries [ 2 ] – has received funding from the National Science Foundation and the European Commission International Digital Libraries program to engage in research about the most effective ways to apply technologies and techniques from the fields of computational linguistics, natural language processing, and information retrieval technologies to challenges faced by students and scholars who are working with texts written in Greek, Latin, and Old Norse [ 3 ]. In its broadest terms, our work has focused in four primary areas: 1) providing access to primary source materials that are often rare and fragile, 2) helping readers understand texts written in difficult languages, 3) enabling researchers to conduct new types of scholarship, and 4) preserving digital resources for the future.

The Cultural Heritage Language Technologies Consortium

A Gentle Introduction to XML

http://www.tei-c.org/Vault/P4/doc/html/SG.html As originally published in previous editions of the Guidelines , this chapter provided a gentle introduction to `just enough' SGML for anyone to understand how the TEI used that standard. Since then, the Gentle Guide seems to have taken on a life of its own independent of the Guidelines, having been widely distributed (and flatteringly imitated) on the web. In revising it for the present draft, the editors have therefore felt free to reduce considerably its discussion of SGML-specific matters, in favour of a simple presentation of how the TEI uses XML. The encoding scheme defined by these Guidelines may be formulated either as an application of the ISO Standard Generalized Markup Language (SGML) 5 or of the more recently developed W3C Extensible Markup Language (XML) 6 .
http://www.athel.com/mp.html The software is used as part of many corpus linguistics courses and is also widely used in ESL/EFL for vocabulary learning and language learning in general. is networkable and operates well under a variety of Windows environments (W95 and above). Available for an educational price of $85 for a single user licence.

Concordance software: MonoConc Pro MP2.2

The encoding scheme defined by these Guidelines is formulated as an application of the Extensible Markup Language (XML) ( Bray et al. (eds.) (2006) ). XML is widely used for the definition of device-independent, system-independent methods of storing and processing texts in electronic form. It is now also the interchange and communication format used by many applications on the World Wide Web. In the present chapter we informally introduce some of its basic concepts and attempt to explain to the reader encountering them for the first time how and why they are used in the TEI scheme.

v. A Gentle Introduction to XML - TEI P5: — Guidelines for Electronic Text Encoding and Interchange

Collatinus

Accueil > programmes Version 10, décembre 2012 Télécharger Historique Analysis, puis Collatinus étaient destinés, à l'origine, à produire des documents sur papier, et c'est encore souvent dans ce but que je l'utilise. J'ai commencé à le perfectionner quand je me suis aperçu que de nombreux utilisateurs s'en servaient à d'autres fins : disposer, lorsqu'on lit un texte latin, d'une aide lexicale et morphologique immédiate et discrète ; faire des recherches lexicales et stylistiques, voire lexicométriques ; donner aux élèves des tâches d'identification, de relevé, de transformation. La version actuelle de Collatinus est numérotée 10 , datée du 27 novembre 2012.

The Latin and Ancient Greek Dependency Treebanks

The Ancient Greek and Latin Dependency Treebanks are an attempt to create a linguistic genome : a large database of Classical texts where the morphological, syntactic, and lexical information for each sentence has been explicitly encoded. The point? To put linguistic research in Greek and Latin on a new quantitative foundation. To help drive a new generation of computational analysis. And above all, to get students and faculty both involved in the production of data that can be useful to the wider scholarly community.
TextSTAT ist ein einfaches Programm zur Analyse von Texten. Es liest Text-Dateien (in diversen Kodierungen) und HTML-Files (auch direkt aus dem Internet), und es erstellt Wortfrequenz-Listen und Konkordanzen von diesen Files. TextSTAT hat einen eigenen Web-Spider, mit dem Sie eine beliebige Anzahl Seiten einer bestimmten Website zu einem TextSTAT-Korpus zusammenstellen können.

TextSTAT