background preloader

MALLET homepage

MALLET homepage
Related:  Data-driven history / DigHum

The Overview Project » How Overview turns Documents into Pictures Overview produces intricate visualizations of large document sets — beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they’re plotted. There are two visualizations in the current prototype version of Overview, and both are based on document clustering. The first is the items plot, which grew out of the proof-of-concept system we presented a year ago. Every document is a dot. Similar documents get pulled together to form visible groups, that is, clusters. Overview also has a “tree” view. The tree view and the items plot show the same thing, just in different ways. Extracting Key Words All of Overview’s clustering depends on grouping similar documents together, but what does that mean? But Overview doesn’t know any of this. Two documents are similar if they have overlapping sets of key words. Where do those documents go? The tree view finds not only clusters but sub-clusters.

AD·VNVM·DATVM JGibbLDA: A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for Parameter Estimation and Inference Data Science Toolkit The Humanities and Critical Code Studies Lab | @ the University of Southern California Snowball ITO - Road Fatalities USA This web site and the information it contains is provided as a public service by ITO World Ltd, using data supplied by the National Highway Traffic Safety Administration (NHTSA), U.S. Department of Transportation (DOT). ITO World Ltd makes no claims, promises or guarantees about the accuracy, completeness, or adequacy of the contents of this web site and expressly disclaims liability for errors and omissions in the contents of this web site. No warranty of any kind, implied, expressed or statutory, including but not limited to the warranties of non-infringement of third party rights, title, merchantability, fitness for a particular purpose and freedom from computer virus, is given with respect to the contents of this web site or its links to other Internet resources. Users of the service should note that the NHTSA/DOT makes no claims, promises or guarantees about the accuracy, completeness, or adequacy of the road fatality data used within this web site.

US Universities with DH Minors/Majors Topic Modeling Toolbox The first step in using the Topic Modeling Toolbox on a data file (CSV or TSV, e.g. as exported by Excel) is to tell the toolbox where to find the text in the file. This section describes how the toolbox converts a column of text from a file into a sequence of words. The process of extracting and preparing text from a CSV file can be thought of as a pipeline, where a raw CSV file goes through a series of stages that ultimately result in something that can be used to train the topic model. Here is a sample pipeline for the pubmed-oa-subset.csv data file: 01.val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1); 03.val tokenizer = { 04. 05. 06. 07. 10.val text = { 11. source ~> 12. 13. 14. 15. 16. 17. The input data file (in the source variable) is a pointer to the CSV file you downloaded earlier, which we will pass through a series of stages that each transform, filter, or otherwise interact with the data. 1.val source = CSVFile("your-csv-file.csv") ~> IDColumn(yourIdColumn) ~> Drop(1);

d3.js Humanidades Digitales - Formación Permanente - No es necesario cumplir con ningún requisito académico para acceder al curso. Está destinado principalmente a: -Estudiantes con formación en distintas áreas de Humanidades que deseen adquirir conocimientos tecnológicos para enfrentarse a los nuevos retos de la cultura digital actual, complementar su formación y abrir nuevas perspectivas tanto para la investigación académica como para su inserción en el mercado laboral. -Investigadores actualmente trabajando en proyectos de diferentes disciplinas humanísticas (filología, arte, historia, filosofía...) que deseen ponerse al día sobre las herramientas y perspectivas tecnológicas existentes con el fin de poder aplicarlas a sus proyectos reales de investigación. -Profesionales que trabajen en el campo de las humanidades(principalmente GLAM: Galerías, Bibliotecas, Archivos y Museos) y que deseen aumentar sus competencias digitales para poder aplicarlas en su propio trabajo.

About WordNet - WordNet - About WordNet About Google+ Ripples - Google+ Help Google+ Ripples creates an interactive graphic of the public shares of any public post or URL on Google+ to show you how it has rippled through the network and help you discover new and interesting people to follow. Ripples shows you: People who have reshared the link will be displayed with their own circle. Inside the circle will be people who have reshared the link from that person (and so on). Circles are roughly sized based on the relative influence of that person. The comments users added when they reshared a link are displayed in the sidebar of Ripples. At the bottom of the Ripples page, you can play an animated version of the visualization that shows how the link was shared over time. Beneath the timeline on the Ripples page statistics on the link. While Ripples displays a lot of cool information, you’re not actually seeing all the action that’s taken place. Not sure if a post is public?