background preloader

Online data collection/storage

Facebook Twitter

An Introduction to Compassionate Screen Scraping. Screen scraping is the art of programatically extracting data from websites. If you think it's useful: it is. If you think it's difficult: it isn't. And if you think it's easy to really piss off administrators with ill-considered scripts, you're damn right. This is a tutorial on not just screen scraping, but socially responsible screen scraping. Its an amalgam of getting the data you want and the Golden Rule, and reading it is going to make the web a better place.

We're going to be doing this tutorial in Python, and will use the httplib2 and BeautifulSoup libraries to make things as easy as possible. Websites crash. For my blog, the error reports I get are all generated by overzealous webcrawlers from search engines (perhaps the most ubiquitous specie of screenscraper). This brings us to my single rule for socially responsible screen scraping: screen scraper traffic should be indistinguishable from human traffic. Cache feverently. Setup Libraries Choosing a Scraping Target Ending Thoughts. Search engine for blogs and social netwoks. Dog blogs. Advanced Methods for Conducting Online Behavioral Research. The Internet is revolutionizing the way psychologists conduct behavioral research. Studies conducted online are not only less error prone and labor intensive but also rapidly reach large numbers of diverse participants at a fraction of the cost of traditional methods.

In addition to improving the efficiency and accuracy of data collection, online studies provide automatic data storage and deliver immediate personalized feedback to research participants—a major incentive that can exponentially expand participant pools. Furthermore, behavioral researchers can also track data on online behavioral phenomena, including Instant Messaging (IM), social networking, and other social media.

This book goes beyond the basics to teach readers advanced methods for conducting behavioral research on the Internet. This book is designed for researchers and advanced graduate students in the behavioral sciences seeking greater technical detail about emerging research methods. Contributors I. II. III. IV. V. NVivo 10 research software for analysis and insight. Import and analyze documents, images, PDFs, audio, video, spreadsheets, web pages, and Twitter and Facebook data Theme, case and in-vivo coding Review coding with coding stripes and highlights Merge NVivo for Mac projects Import and create transcripts Import information from reference management software Import notes directly from OneNote Online Autocode datasets Memos and annotations Matrix coding, coding, word frequency, text search and coding comparison queries Word trees and word clouds Export and share items Share your research by printing visualizations, text sources and node reference view Hierarchical visualizations, mind maps, explore diagrams and comparison diagrams Work with data in virtually any language Access user interface in English, German, French and Spanish Import and analyze text Text search, word frequency and coding queries Charts, word clouds, word trees, explore and comparison diagrams Import articles from reference management software Connect to NVivo for Teams Relationship coding.

Big Data | Neal Caren. Note: I’m slowly transitioning this to a new site. My Github page has the most recent information. I’m publishing a series of tutorials that teach the fundamentals of quantitative text analysis for social scientists. The emphasis is on application. How can you collect and analyze thousands of web pages or Tweets? What are the best practices for turning words into numbers? The tutorials are designed for people who may be familiar with a standard statistical program, such as Stata or SPSS, or perhaps a qualitative analysis program like NVivo, but who haven’t done any quantitative text analysis or used Python. Python is an open-source computer language that is quite popular amongst computer programmers. The tutorials are written in a cumulative fashion, following the flow of a workshop on collecting and analyzing data from the web that I lead at Carolina. Special thanks to Sarah Gaby for beta testing the posts.

Using the Google Maps APIInequality from Space Case Studies.