background preloader

2014 text analysis

Facebook Twitter

Math of Ideas: A Word is Worth a Thousand Vectors. Word vectors give us a simple and flexible platform for understanding text, there are a few diverse examples that should help build your confidence in developing and deploying NLP systems and what problems they can solve.

Math of Ideas: A Word is Worth a Thousand Vectors

By Chris Moody (Stitchfix). Standard natural language processing (NLP) is a messy and difficult affair. It requires teaching a computer about English-specific word ambiguities as well as the hierarchical, sparse nature of words in sentences. At Stitch Fix, word vectors help computers learn from the raw text in customer notes. Our systems, composed of machines and human experts, need to recommend the maternity line when she says she's in her 'third trimester', identify a medical professional when she writes that she 'used to wear scrubs to work', and distill 'taking a trip' into a Fix for vacation clothing. The following example set the natural language community afire 1 back in 2013: king - man + women = queen Let's review the new abilities that word vectors grant us. Word2vec - Tool for computing continuous distributed representations of words. Math of Ideas: A Word is Worth a Thousand Vectors. Pinali / eTBLAST. ETBLAST è un motore di ricerca per similarità ed offre l'accesso alle seguenti banche dati: NASA technical reports database Il server di ETBLAST confronta la domanda fatta dall'utente in formato testo con le basi di dati utilizzando un algoritmo di ricerca (brevettato) basato sulla sensibilità delle parole inserite.

Pinali / eTBLAST

Quando la maggior parte degli utilizzatori della banca dati PubMed (Medline) effettua la ricerca selezionando una o due parola chiave per descrivere il proprio argomento (soggetto), quindi passa in rassegna attraverso una lista lunga i risultati ottenuti. Quando trova un abstracts interessante lo seleziona e cerca gli articoli correlati "Related articles", nella speranza di individuare quelli più attinenti. Se troviamo un'altro articolo, analogamente prendiamo gli articoli correlati e cosi-via. ETBLAST rende tutto molto più facile fornendo i risultati migliori già nella prima ricerca. ETBLAST ordina i risultati in ordine di rilevanza, mentre PubMed li ordina per data. Social network e big data, la “sentiment analysis” spiegata ai profani. Social network e big data, la “sentiment analysis” spiegata ai profani Un libro illustra le basi di questa analisi delle opinioni .

Che inizia a prendere piede anche nel nostro Paese. In politica (e non solo) I tre fondatori di Voices from the Blogs, Luigi Curini, Stefano M. Iacus e Andrea Ceron Ascoltare per capire quello che pensano gli utenti. Opinion Mining Test Site. Word Cloud. Description AKA Tag Clouds are a visualisation method that typically displays how frequently words appear in a given sample of text by making it proportional to the size of a word.

Word Cloud

Each word, sized on its frequency is then, typically arrange in a cluster or cloud. Alternatively, the words can be arrange in any format: horizontal lines, columns or within a shape etc. Word Clouds can also be used to display words that have metadata assigned to them. For example, in a Word Cloud of all the World's countries, population could be assigned to each country's name to determine it's size. Colour used on Word Clouds is usually meaningless and is primarily aesthetic, but could be used to categorise words or display another data variable.

Typically, Word Clouds are used on websites or blogs to depict keyword or tag usage. Although being simple and are easy to understand, Word Clouds have some major flaws: Functions Analysing Text Comparisons Distribution / Frequency Proportions Anatomy Variations. Data Database. A phrase net is a data depiction tool used to display networks and related ideas.

Data Database

This would work with both qualitative and quantitative data sets, though it would be particularly helpful for analyzing texts. The software automatically finds connections between words, presumably based on a placement algorithm. The data conveyed through this tool is textual. The visuals show connections between the textual phrases.This visual is similar to a word cloud, but allows some more freedom in depicting information as well as being created specifically for textual information.A phrase net creator can be found on the Many Eyes website: Like this: Like Loading... Many Eyes Phrase Net Tool. Vega Live Editor. Phrase Net. When to use a phrase net A phrase net diagrams the relationships between different words used in a text.

Phrase Net

It uses a simple form of pattern matching to provide multiple views of the concepts contained in a book, speech, or poem. The image below is a word graph made from an article taken from the IBM web site. The program has drawn a network of words, where two words are connected if they appear together in a phrase of the form "X and Y": For instance, "the" is connected to other words by thicker arrows.

How phrase nets work Phrase net analyzes a text by looking for pairs of words that fit particular patterns. After you specify a pattern, the program creates a network diagram of the words it finds as matches. Defining patterns Matching different patterns gives different views of the text. There are three ways to specify a pattern. Filtering results Not all matching words are shown in the visualization. In addition, if the network contains more than 50 words, it often becomes hard to read.