Think Stats PDFversion. Data Analysis with Open Source Tools • The underlying properties of data• The ways to represent the current status of the data• The criteria to select relevant data and attributes• The algorithms to analyze the selected data and attributes• The ways to report the conclusions of the performed data analysis. The author Philipp K. Janert takes a designer approach rather than an implementer approach. That means that you will gain important suggestions and tips to propose a plan for data analysis, instead of how to build an entire or partial information infrastructure using open source tools like Python, R, PostgreSQL and Weka. Then, for some developers the lack of full programming constructs may be disappointing.
However, I feel that Philipp K. Janert's main goal is to share with us his own professional experiences in real world enterprise analytical projects from a requirements perspective. This review is in exchange of the O'Reilly Bloggers Review Program (oreilly.com/bloggers). 21 Recipes for Mining Twitter The main three aspects that I loved about this little gem were: - The author does a great job at highlighting the main Twitter's API limitations (e.g. maximum number of requests for each API call) and bugs (e.g. user ids being different in the '/search' API).
Solutions, in the form of functional code, are given. This information can save literally hours debugging code or waiting for twitter to remove restrictions imposed after going beyond some of the limits imposed by the system. Mining the Social Web Once you get comfortable with the basics, the author quickly moves from topic to topic, giving a good introduction into many aspects of how to mine data and generate useful conclusions.
Some of the examples include accessing your twitter feed with OAuth, processing feeds to determine influence, using set-wise opeations with redis to determine which of your friends are also followers, storing data in CouchDB, using map-reduce to determine the most popular mentions and topics, natural language processing, and seeing data with various visualization tools. Programming Collective Intelligence About me and why I read this book I've been programming professionally for ~7.5 years, mainly business applications and reporting, so I already have quite some love for data. While I haven't used math much in my day jobs, I liked (and was good at) it in high school, including taking extra classes - so I have learned basic statistics.
Refreshing and advancing my data analytics skills is one of my goals this year, and reading this book was part of the plan. About the book The book introduces lots of algorithms that can be used to gain new insight into any kind of data one might come across. The explanations are broken up into digestible chunks, and are supported by great visualizations. Each of the algorithms is illustrated with real world application examples, and examples where applying them doesn't make sense are brought too.
Nonetheless, it is great to have actual code to play with, just the initial reading and reviewing of it requires some extra effort. Natural Language Processing with Python The book has a matching website www.nltk.org. This book is addressed to a broad academic community: One audience is liberal arts students.. The second audience is the computer science based student. The third audience is teachers and researchers worldwide. This book tries hard to be a high quality introduction to natural language processing.
Natural Language Processing itself is one of the great problems of computing. One of the enjoyable things this book does is the authors carefully outline some of the great problems in computer science that are central to natural language processing. Natural Language Processing is also a field of some interest and utility to linguists, critics, historians, students of language and rhetoric and students of 20th century philosophy. I remember reading the philosopher Wittgenstein (his writings vintage 1943) where he did thought experiments of putting words in a tray. Is this a book that might help me write a project specific text analysis engine?