background preloader

Appiled

Facebook Twitter

Think Stats  PDFversion. Data Analysis with Open Source Tools  • The underlying properties of data• The ways to represent the current status of the data• The criteria to select relevant data and attributes• The algorithms to analyze the selected data and attributes• The ways to report the conclusions of the performed data analysis.

Data Analysis with Open Source Tools 

The author Philipp K. Janert takes a designer approach rather than an implementer approach. That means that you will gain important suggestions and tips to propose a plan for data analysis, instead of how to build an entire or partial information infrastructure using open source tools like Python, R, PostgreSQL and Weka. Then, for some developers the lack of full programming constructs may be disappointing. However, I feel that Philipp K. Despite the implementer approach is not fully covered, you'll be able to understand how the analytical demands can be satisfied using specifically the programming languages Python and R given its speed of execution, numerical analysis capabilities and cross-platform support.

21 Recipes for Mining Twitter  The main three aspects that I loved about this little gem were: - The author does a great job at highlighting the main Twitter's API limitations (e.g. maximum number of requests for each API call) and bugs (e.g. user ids being different in the '/search' API).

21 Recipes for Mining Twitter 

Solutions, in the form of functional code, are given. This information can save literally hours debugging code or waiting for twitter to remove restrictions imposed after going beyond some of the limits imposed by the system. Mining the Social Web  Once you get comfortable with the basics, the author quickly moves from topic to topic, giving a good introduction into many aspects of how to mine data and generate useful conclusions.

Mining the Social Web 

Some of the examples include accessing your twitter feed with OAuth, processing feeds to determine influence, using set-wise opeations with redis to determine which of your friends are also followers, storing data in CouchDB, using map-reduce to determine the most popular mentions and topics, natural language processing, and seeing data with various visualization tools. Programming Collective Intelligence  About me and why I read this book I've been programming professionally for ~7.5 years, mainly business applications and reporting, so I already have quite some love for data.

Programming Collective Intelligence 

While I haven't used math much in my day jobs, I liked (and was good at) it in high school, including taking extra classes - so I have learned basic statistics. Refreshing and advancing my data analytics skills is one of my goals this year, and reading this book was part of the plan. About the book The book introduces lots of algorithms that can be used to gain new insight into any kind of data one might come across. Each of the algorithms is illustrated with real world application examples, and examples where applying them doesn't make sense are brought too. In addition to the well written, gradual introduction, the book has a concise algorithm reference at the end, so when one needs a quick refresher, there is no need to wade through the lengthy tutorials. The book was written in 2007, but is not dated. Natural Language Processing with Python 

The book has a matching website www.nltk.org.

Natural Language Processing with Python 

This book is addressed to a broad academic community: One audience is liberal arts students.. The second audience is the computer science based student. The third audience is teachers and researchers worldwide. This book tries hard to be a high quality introduction to natural language processing. Natural Language Processing itself is one of the great problems of computing. Natural Language Processing is also a field of some interest and utility to linguists, critics, historians, students of language and rhetoric and students of 20th century philosophy.

I remember reading the philosopher Wittgenstein (his writings vintage 1943) where he did thought experiments of putting words in a tray. The fourth audience for this book might be the programmer seeking an interesting opportunity: Is this a book that might help me write a project specific text analysis engine? Would the NLTK be useful if I wanted to write a search engine?