background preloader

Book - Natural Language Toolkit

Book - Natural Language Toolkit

Related:  programming

K Means Clustering with Tf-idf Weights Unsupervised learning algorithms in machine learning impose structure on unlabeled datasets. In Prof. Andrew Ng's inaugural ml-class from the pre-Coursera days, the first unsupervised learning algorithm introduced was k-means, which I implemented in Octave for programming exercise 7. OpinionFinder 1.x OpinionFinder 1.x Available versions OpinionFinder 1.x relies on many external software packages (e.g. SUNDANCE, SCOL, BoosTexter) which are neither built nor supported by our group. Python: Inverted Index for dummies An Inverted Index is an index data structure storing a mapping from content, such as words or numbers, to its document locations and is generally used to allow fast full text searches. The first step of Inverted Index creation is Document Processing In our case is word_index() that consist of word_split(), normalization and the deletion of stop words ("the", "then", "that"...). def word_split(text): word_list = [] wcurrent = [] windex = None for i, c in enumerate(text): if c.isalnum(): wcurrent.append(c) windex = i elif wcurrent: word = u''.join(wcurrent) word_list.append((windex - len(word) + 1, word)) wcurrent = [] if wcurrent: word = u''.join(wcurrent) word_list.append((windex - len(word) + 1, word)) return word_list

Introduction to Information Retrieval This is the companion website for the following book. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. Getting started — tweepy v1.4 documentation Introduction If you are new to Tweepy, this is the place to begin. The goal of this tutorial is to get you set-up and rolling with Tweepy. We won’t go into too much details here, just some important basics. Hello Tweepy import tweepy public_tweets = tweepy.api.public_timeline()for tweet in public_tweets: print tweet.text How to Prioritize Work: 7 Practical Methods for When "Everything is Important" One of the biggest struggles in the modern workplace is knowing how to prioritize work. Workloads are ballooning and everything feels important. However, the truth is that a lot of the work we do every day doesn’t really need to be done.

simple web crawler / scraper tutorial using requests module in python Let me show you how to use the Requests python module to write a simple web crawler / scraper. So, lets define our problem first. In this page: I am publishing some programming problems. So, now I shall write a script to get the links (url) of the problems. Open-source intelligence Open sources for intelligence[edit] OSINT includes a wide variety of information and sources: OSINT is distinguished from research in that it applies the process of intelligence to create tailored knowledge supportive of a specific decision by a specific individual or group.[3] Definers for OSINT[edit] OSINT is defined by both the U.S. Director of National Intelligence and the U.S.