Text analysis

Facebook Twitter

PHI Latin Texts. 2013. Articles Visualizing and Analyzing the Hollywood Screenplay with ScripThreadsEric Hoyt, University of Wisconsin-Madison; Kevin Ponto, University of Wisconsin-Madison; Carrie Roy, University of Wisconsin-Madison Of all narrative textual forms, the motion picture screenplay may be the most perfectly pre-disposed for computational analysis.


Screenplays contain capitalized character names, indented dialogue, and other formatting conventions that enable an algorithmic approach to analyzing and visualizing film narratives. In this article, the authors introduce their new tool, ScripThreads, which parses screenplays, outputs statistical values which can be analyzed, and offers four different types of visualization, each with its own utility. The visualizations represent character interactions across time as a single 3D or 2D graph. Correcteur Orthographique de Latin. Correcteur Orthographique de Latin Détails.

Correcteur Orthographique de Latin

Text Mechanic™ - Text Manipulation Tools. Dtm-Vic / Lebart. Last modified on 08/19/2013 11:42:16 Software DtmVic: Exploratory statistical processing of complex data sets comprising both numerical and textual data.

Dtm-Vic / Lebart

Applications concern primarily the processing of responses to open ended questions in socio-economic sample surveys. - Special emphasis on: Complementary use of visualization techniques (Principal Component Analysis, Two-way and Multiple Correspondence Analysis) and clustering techniques (hybrid method using both hierarchical clustering and k-means technique; Self Organizing Maps (SOM). Alpheios Texts. Digital Humanities 2012. Computational stylistics. Z:perseus-annis [Klafil] Annis² Corpus Search. Overview - ANNIS2 - a Linguistic Database for Exploring Information Structure. Pede certo. Index Thomisticus Treebank. Croatian Dependency Treebank: homepage.

Croatian Dependency Treebank is one of tasks of the project 0130418 "Development of Croatian Language Resources" supported by the Ministry of Science, Education and Sports of the Republic of Croatia. goal To build a syntactically annotated Croatian corpus of at least 100,000 tokens. method.

Croatian Dependency Treebank: homepage

GOLDVARB 2001 Users' Manual. We couldn't find the page you asked for.

GOLDVARB 2001 Users' Manual

It might have been moved or deleted, or you might have tried the wrong address. To find the page you wanted, you could try our A-Z, or search (see the Search box at the top-right of this page). For further assistance, please contact the Computing Help Desk:Tel 01206 872345E-mail desk (non Essex users should add @essex.ac.uk to create a full e-mail address) Goldvarb X. References.

Goldvarb X

Linguistic Annotation. This page describes tools and formats for creating and managing linguistic annotations . `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, "named entity" identification, co-reference annotation, and so on.

The focus is on tools which have been widely used for constructing annotated linguistic databases, and on the formats commonly adopted by such tools and databases. This page began as a set of links to systems for speech annotation, and the coverage of textual annotation is still inadequate. This page is no longer being actively maintained. This page is the home of the COCOSDA technical topic domain Corpus Annotation Tools .

Key. The Cultural Heritage Language Technologies Consortium. 1.

The Cultural Heritage Language Technologies Consortium

Introduction For the past three years, the Cultural Heritage Language Technologies consortium [1] – situated at eight institutions in four countries [2] – has received funding from the National Science Foundation and the European Commission International Digital Libraries program to engage in research about the most effective ways to apply technologies and techniques from the fields of computational linguistics, natural language processing, and information retrieval technologies to challenges faced by students and scholars who are working with texts written in Greek, Latin, and Old Norse [3]. In its broadest terms, our work has focused in four primary areas: 1) providing access to primary source materials that are often rare and fragile, 2) helping readers understand texts written in difficult languages, 3) enabling researchers to conduct new types of scholarship, and 4) preserving digital resources for the future. 2.

Providing Access to Rare and Fragile Material. A Gentle Introduction to XML. As originally published in previous editions of the Guidelines, this chapter provided a gentle introduction to `just enough' SGML for anyone to understand how the TEI used that standard.

A Gentle Introduction to XML

Since then, the Gentle Guide seems to have taken on a life of its own independent of the Guidelines, having been widely distributed (and flatteringly imitated) on the web. In revising it for the present draft, the editors have therefore felt free to reduce considerably its discussion of SGML-specific matters, in favour of a simple presentation of how the TEI uses XML. The encoding scheme defined by these Guidelines may be formulated either as an application of the ISO Standard Generalized Markup Language (SGML)5 or of the more recently developed W3C Extensible Markup Language (XML)6. Concordance software: MonoConc Pro MP2.2. Tapor Tools Prototype. Association for Computational Linguistics. V. A Gentle Introduction to XML - TEI P5: — Guidelines for Electronic Text Encoding and Interchange. The encoding scheme defined by these Guidelines is formulated as an application of the Extensible Markup Language (XML) (Bray et al.

v. A Gentle Introduction to XML - TEI P5: — Guidelines for Electronic Text Encoding and Interchange

(eds.) (2006)). XML is widely used for the definition of device-independent, system-independent methods of storing and processing texts in electronic form. It is now also the interchange and communication format used by many applications on the World Wide Web. In the present chapter we informally introduce some of its basic concepts and attempt to explain to the reader encountering them for the first time how and why they are used in the TEI scheme.

Digital Classicist: index. Collatinus. Accueil > programmes Version 10, décembre 2012 Télécharger Historique Analysis, puis Collatinus étaient destinés, à l'origine, à produire des documents sur papier, et c'est encore souvent dans ce but que je l'utilise.


J'ai commencé à le perfectionner quand je me suis aperçu que de nombreux utilisateurs s'en servaient à d'autres fins : disposer, lorsqu'on lit un texte latin, d'une aide lexicale et morphologique immédiate et discrète ; faire des recherches lexicales et stylistiques, voire lexicométriques ; donner aux élèves des tâches d'identification, de relevé, de transformation. La version actuelle de Collatinus est numérotée 10, datée du 27 novembre 2012. C'est la dernière : Collatinus atteint maintenant une masse critique, qui risque de paraître pesante à certains.

The Latin and Ancient Greek Dependency Treebanks. The Ancient Greek and Latin Dependency Treebanks are an attempt to create a linguistic genome: a large database of Classical texts where the morphological, syntactic, and lexical information for each sentence has been explicitly encoded. The point? To put linguistic research in Greek and Latin on a new quantitative foundation.

XQuery Introduction. XPath Introduction. TextSTAT.