Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information. The term is also used in psycholinguistics when describing language comprehension. In this context, parsing refers to the way that human beings analyze a sentence or phrase (in spoken language or text) "in terms of grammatical constituents, identifying the parts of speech, syntactic relations, etc."  This term is especially common when discussing what linguistic cues help speakers to interpret garden-path sentences. Human languages Traditional methods Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language. Computational methods Psycholinguistics Parser
Related: Text Analytics
The Stanford NLP (Natural Language Processing) GroupAbout | Citing | Questions | Download | Included Tools | Extensions | Release history | Sample output | Online | FAQ A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s. Package contents This package is a Java implementation of probabilistic natural language parsers, both highly optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser. As well as providing an English parser, the parser can be and has been adapted to work with other languages. Shift-reduce constituency parser Usage notes Java
Lesson 6 - Tuples, Lists, and DictionariesIntroduction Your brain still hurting from the last lesson? Never worry, this one will require a little less thought. Think about it - variables store one bit of information. But what if you need to store a long list of information, which doesn't change over time? The Solution - Lists, Tuples, and Dictionaries For these three problems, Python uses three different solutions - Tuples, lists, and dictionaries: Lists are what they seem - a list of values. Tuples Tuples are pretty easy to make. Code Example 1 - creating a tuple months = ('January','February','March','April','May','June',\ 'July','August','September','October','November',' December') Note that the '\' thingy at the end of sthurlow.comthe first line carries over that line of code to the next line. Python then organises those values in a handy, numbered index - starting from zero, in the order that you entered them in. Table 1 - tuple indicies And that is tuples! Lists Lists are extremely similar to tuples. Clears things up?
Zipf's lawZipf's law /ˈzɪf/, an empirical law formulated using mathematical statistics, refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions. The law is named after the American linguist George Kingsley Zipf (1902–1950), who first proposed it (Zipf 1935, 1949), though the French stenographer Jean-Baptiste Estoup (1868–1950) appears to have noticed the regularity before Zipf. It was also noted in 1913 by German physicist Felix Auerbach (1856–1933). Motivation Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc. Theoretical review Formally, let: Related laws
Web 3.0: When Web Sites Become Web ServicesToday's Web has terabytes of information available to humans, but hidden from computers. It is a paradox that information is stuck inside HTML pages, formatted in esoteric ways that are difficult for machines to process. The so called Web 3.0, which is likely to be a pre-cursor of the real semantic web, is going to change this. What we mean by 'Web 3.0' is that major web sites are going to be transformed into web services - and will effectively expose their information to the world. The transformation will happen in one of two ways. The Amazon E-Commerce API - open access to Amazon's catalog We have written here before about Amazon's visionary WebOS strategy. Why has Amazon offered this service completely free? The rise of the API culture The web 2.0 poster child, del.icio.us, is also famous as one of the first companies to open a subset of its web site functionality via an API. Standardized URLs - the API without an API So how do these services get around the fact that there is no API?
Perl Weekly: A Free, Weekly Email Newsletter for the Perl Programming languageBigSee < Main < WikiTADAThis page is for the SHARCNET and TAPoR text visualization project. Note that it is a work in progress as this is an ongoing project. At the University of Alberta we picked up the project and gave a paper at the Chicago Colloquium on Digital Humanities and Computer Science with the title | The Big See: Large Scale Visualization. The Big See is an experiment in high performance text visualization. We are looking at how a text or corpus of texts could be represented if processing and the resolution of the display were not an issue. Most text visualizations, like word clouds and distribution graphs, are designed for the personal computer screen. Project Goals This project imagines possible paradigms for the visual representation of a text that could scale up to very high resolution displays (data walls), 3D displays, and animated displays. Participants Geoffrey Rockwell is a Professor of Philosophy and Humanities Computing at the University of Alberta. Collocation Graphs in 3D Space Research
Software: Web Content Mining, Screen scrapingcommercial | free and open source AMI Enterprise Intelligence searches, collects, stores and analyses data from the web. Automation Anywhere, intelligent automation software to automate business & IT processes, including web data extraction and screen scraping. Bixolabs, an elastic web mining platform built w/Bixo, Cascading & Hadoop for Amazon's cloud (EC2). Crawlera, a smart IP rotator to work around bot countermeasures, allows to crawl more complex sites like Google. free and open source Bixo, an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. Related
blogs.perl.org — blogging the onionSemantic Search Engine and Text Analysis