background preloader

MALLET homepage

MALLET homepage
Related:  Data-driven history / DigHum

The Overview Project » How Overview turns Documents into Pictures Overview produces intricate visualizations of large document sets — beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they’re plotted. There are two visualizations in the current prototype version of Overview, and both are based on document clustering. The first is the items plot, which grew out of the proof-of-concept system we presented a year ago. Every document is a dot. Similar documents get pulled together to form visible groups, that is, clusters. Overview also has a “tree” view. The tree view and the items plot show the same thing, just in different ways. Extracting Key Words All of Overview’s clustering depends on grouping similar documents together, but what does that mean? But Overview doesn’t know any of this. Two documents are similar if they have overlapping sets of key words. Where do those documents go? The tree view finds not only clusters but sub-clusters.

AD·VNVM·DATVM JGibbLDA: A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for Parameter Estimation and Inference Processing.js The Humanities and Critical Code Studies Lab | @ the University of Southern California Snowball Data Science Toolkit US Universities with DH Minors/Majors Topic Modeling Toolbox The first step in using the Topic Modeling Toolbox on a data file (CSV or TSV, e.g. as exported by Excel) is to tell the toolbox where to find the text in the file. This section describes how the toolbox converts a column of text from a file into a sequence of words. The process of extracting and preparing text from a CSV file can be thought of as a pipeline, where a raw CSV file goes through a series of stages that ultimately result in something that can be used to train the topic model. Here is a sample pipeline for the pubmed-oa-subset.csv data file: 01.val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1); 03.val tokenizer = { 04. 05. 06. 07. 10.val text = { 11. source ~> 12. 13. 14. 15. 16. 17. The input data file (in the source variable) is a pointer to the CSV file you downloaded earlier, which we will pass through a series of stages that each transform, filter, or otherwise interact with the data. 1.val source = CSVFile("your-csv-file.csv") ~> IDColumn(yourIdColumn) ~> Drop(1);

ITO - Road Fatalities USA This web site and the information it contains is provided as a public service by ITO World Ltd, using data supplied by the National Highway Traffic Safety Administration (NHTSA), U.S. Department of Transportation (DOT). ITO World Ltd makes no claims, promises or guarantees about the accuracy, completeness, or adequacy of the contents of this web site and expressly disclaims liability for errors and omissions in the contents of this web site. No warranty of any kind, implied, expressed or statutory, including but not limited to the warranties of non-infringement of third party rights, title, merchantability, fitness for a particular purpose and freedom from computer virus, is given with respect to the contents of this web site or its links to other Internet resources. Users of the service should note that the NHTSA/DOT makes no claims, promises or guarantees about the accuracy, completeness, or adequacy of the road fatality data used within this web site.

Humanidades Digitales - Formación Permanente - No es necesario cumplir con ningún requisito académico para acceder al curso. Está destinado principalmente a: -Estudiantes con formación en distintas áreas de Humanidades que deseen adquirir conocimientos tecnológicos para enfrentarse a los nuevos retos de la cultura digital actual, complementar su formación y abrir nuevas perspectivas tanto para la investigación académica como para su inserción en el mercado laboral. -Investigadores actualmente trabajando en proyectos de diferentes disciplinas humanísticas (filología, arte, historia, filosofía...) que deseen ponerse al día sobre las herramientas y perspectivas tecnológicas existentes con el fin de poder aplicarlas a sus proyectos reales de investigación. -Profesionales que trabajen en el campo de las humanidades(principalmente GLAM: Galerías, Bibliotecas, Archivos y Museos) y que deseen aumentar sus competencias digitales para poder aplicarlas en su propio trabajo.

About WordNet - WordNet - About WordNet d3.js