Civic Hacking with Python – Part 2 « syslogd.net. I apologize for the time it took before writing this part, but you know what they say: Better late then never. In this part, I’ll explain how I extracted (scraped) the data from the Transports Quebec database mentioned in Part 1 using Python, Scrapy and a few other tools. This post is not a full-fledged tutorial on using Scrapy, but it should give you a place to start if you’d like to do something similar.
In order to jump directly to the good stuff, I’ll skip the Scrapy installation and assume you are already familiar with Python. If you are not familiar with Python, you’re missing a lot. Get started here, then use some Google-Fu and start hacking on your own project to keep learning. This small project was a quick and dirty drive-by experiment. This code is not: a complete solution, the best solution, production-ready, [insert your own similar statement here]. That said, let’s get started. If you visit the website, you’ll notice that the first page you get is an empty form like this one: Sphinx | Open Source Search Server. 12.1. Data source configuration options Data source type. Mandatory, no default value. Known types are mysql, pgsql, mssql, xmlpipe2, tsvpipe, and odbc. All other per-source options depend on source type selected by this option.
Example: type = mysql SQL server host to connect to. In the simplest case when Sphinx resides on the same host with your MySQL or PostgreSQL installation, you would simply specify "localhost". Sql_host = localhost SQL server IP port to connect to. Sql_port = 3306 SQL user to use when connecting to sql_host. Sql_user = test SQL user password to use when connecting to sql_host. Sql_pass = mysecretpassword SQL database (in MySQL terms) to use after the connection and perform further queries within. Sql_db = test UNIX socket name to connect to for local SQL servers. On Linux, it would typically be /var/lib/mysql/mysql.sock.
Sql_sock = /tmp/mysql.sock 12.1.8. mysql_connect_flags MySQL client connection flags. Mysql_connect_flags = 32 # enable compression where Range query step. How to build a simple web crawler. If you're creating a search engine you'll need a way to collect documents. In this excerpt from Tony Segaran's Programming Collective Intelligence the author shows you how to set up a simple web crawler using existing tools.
I'll assume for now that you don't have a big collection of HTML documents sitting on your hard drive waiting to be indexed, so I'll show you how to build a simple crawler. It will be seeded with a small set of pages to index and will then follow any links on that page to find other pages, whose links it will also follow. This process is called crawling or spidering. To do this, your code will have to download the pages, pass them to the indexer (which you'll build in the next section), and then parse the pages to find all the links to the pages that have to be crawled next.
For the examples in this chapter, I have set up a copy of several thousand files from Wikipedia, which will remain static at >> import urllib2 >> contents=c.read( ) '<! Automation - Workflow for academic research projects, one-step builds, and the Joel Test. (3) What are some of the best books I learn programming from. Programming Collective Intelligence: Building Smart Web 2.0 Applications (9780596529321): Toby Segaran. (3) Book Recommendations: What are some good introductory books on network theory, particularly in the social sciences. (3) What are some of the best books on Computer Science.
Millennium Villages Project. The Millennium Villages Project is a project of the Earth Institute at Columbia University, the United Nations Development Programme, and Millennium Promise. It is an approach to ending extreme poverty and meeting the Millennium Development Goals—eight globally endorsed targets that address the problems of poverty, health, gender equality, and disease. The Millennium Villages aim to promote an integrated approach to rural development. By improving access to clean water, sanitation and other essential infrastructure such as education, food production, basic health care, and by focusing on environmental sustainability, Millennium Villages claims to ensure that communities living in extreme poverty have a real, sustainable opportunity to lift themselves out of the poverty trap. Millennium Villages are divided into different types.
The project was initially funded through a combination of World Bank loans and private contributions, including $50 million from George Soros. Critics BuzzFeed, the Ad Model for the Facebook Era? On a Tuesday in February, Matt Stopera, a 24-year-old senior editor at a website called BuzzFeed, saw something hilarious on Twitter. It was an image from the archives of Sports Illustrated, circa 1991, of the diminutive TV nerd Urkel playing hoops with actor Will Smith and Indiana Pacer Reggie Miller. The next day at work, Stopera looked around the Internet for photographs in a similar spirit. He found one of Ginger Spice sitting at a bulky desktop computer and another of Arsenio Hall grinning at Bill Clinton, who was wailing on a saxophone.
He arranged the photos on a single page, wrote a pithy caption for each one, and listed photo credits where he could. At around 5 p.m., Stopera published “48 Pictures That Perfectly Capture the ’90s” on BuzzFeed. “These pictures are all that and a bag of chips!” He wrote at the top of the list. At a time when massively popular Internet sensations often seem random—irreplicable one-offs such as “Kony 2012″—Stopera produces reliable hits.
Upcoming Group Activities near San Diego | Lifecrowd. The Future of Magazines Should Look a Lot Like Spotify. By Hamish McKenzie On March 26, 2012 The options we have for reading magazine journalism in the digital format are pretty sad. We live in an era of self-driving cars, augmented reality, and we can keep a map of the entire planet in our pocket, but we are stuck reading magazine journalism the way it has always been presented: in dead print load-dumped onto unfeeling pages, tied up into inseparable bundles (even if they are digital). Tablet computers may well be the saviors of magazines, but even in the face of declining circulations, magazines are doing little to save themselves. Magazine reading on tablets is proving to be almost as cumbersome as it is on paper, with an anachronistic page-turning mentality baked into the apps and a copied-and-pasted design lifted directly from the versions you buy at the drug store.
But the worst part is the distribution. Take Apple’s Newsstand for the iPad. It’s okay. The first problem is that there is an app for each magazine. Break up the bundle. FamilyLeaf Brings Your Kin Together In Its Own Private Social Network. Facebook is on its way to having a billion members, but it’s not always making friends everywhere it goes. Two young men, both aged 19 and in the most recent crop of Y Combinator startups, think they’ve found a gap in the market that has yet to be served that well by the biggest social network: families.
FamilyLeaf was created by childhood friends Wesley Zhao and Ajay Mehta (last seen here spinning out a Y U NO yarn to gain entry into YC; it worked). And it was borne out of a desire to have an easy-to-use online space for you and your relations that address some key “misuse” of sites like Facebook — something they say became especially apparent to the two of them after they left for college (respectively Wharton and NYU Stern, where they are now on a leave of absence). Yes, there are plenty of other sites that have tried to address the family social networking angle, but what’s attractive about FamilyLeaf is that for now, it’s free to use and very easy to get started. India. Check-In Needs To Work, But How Can We Fix It? Remember Highlight? That app that everyone thought was hot stuff back at SXSW?
I used it for a few days and then deleted it, discovering quite quickly that the app, despite some utility, was an absolute battery hog. But what Highlight did was prove that, given the proper scenario, check-in works and is important. What frustrates me most, however, is that we keep doing it wrong. Take this new app, Chkin.at, for example. This is not a new complaint and it won’t be the last time someone grumbles about the current state of discovery-style apps. Check-in becomes valuable when we don’t notice it. This, in turn, sets privacy advocates on edge because, in a sense, the app is telling people where you are without your explicit knowledge (although not without your explicit permission.) It seems the best apps are those that offer check-in after the fact. Most check-in apps get it wrong more than they get it right. [Image: alpturk33/Shutterstock]