background preloader

Cl and me

Facebook Twitter

Scraping with Phantomjs. Getting involved in startup projects and the open data movement in Montreal, I began playing with scraping technologies to crawl the web, sanitize and structure data.

Scraping with Phantomjs

In this article, we'll show how to use CasperJS to fetch and save data. We'll also demo a script example that uses TOR for anonymity. Scraping - Beyond the Basics — Data Wrangling Handbook 0.1 documentation. Dealing with JavaScript JavaScript can be a pain for scrapers.

Scraping - Beyond the Basics — Data Wrangling Handbook 0.1 documentation

JavaScript is often used to alter the DOM on pages after they have been created. A Useful Little Pattern for Web Scraping. I’ve been doing a lot of web and Google Spreadsheet scraping recently, and one situation I’ve run into a lot is that the schema of the data source doesn’t quite fit into the schema I’m trying to dump the data into.

A Useful Little Pattern for Web Scraping

The data source might expose someone’s full name, for instance, whereas I want to store the first and last name separately. I’ve developed a useful little coding pattern to help with that situation that I thought I’d share here. Easy Web Scraping With Node.js. Web scraping is a technique used to extract data from websites using a computer program that acts as a web browser.

Easy Web Scraping With Node.js

The program requests pages from web servers in the same way a web browser does, and it may even simulate a user logging in to obtain access. It downloads the pages containing the desired data and extracts the data out of the HTML code. Once the data is extracted it can be reformatted and presented in a more useful way. In this article I'm going to show you how to write web scraping scripts in Javascript using Node.js. Mjhea0/node-express-ajax-craigslist. Trivago.com - The world's top hotel price comparison site with over 700,000 hotels. Why the Web Hasn't Birthed a Prettier Craigslist. You can check-out any time you like, But you can never leave!

Why the Web Hasn't Birthed a Prettier Craigslist

If I told you these Eagles lyrics described a certain website, you'd probably think it was Facebook. After all, public exclamations of quitting Facebook are so common it's cliche, but by and large the people still stay. The same is true on other networks that angered users with product or business changes. But even when a better alternative arrives — such as Path to Facebook or App.net to Twitter, we still don't see the users walk away. Growth Hacker is the new VP Marketing at @andrewchen. The rise of the Growth Hacker The new job title of “Growth Hacker” is integrating itself into Silicon Valley’s culture, emphasizing that coding and technical chops are now an essential part of being a great marketer.

Growth Hacker is the new VP Marketing at @andrewchen

Growth hackers are a hybrid of marketer and coder, one who looks at the traditional question of “How do I get customers for my product?” And answers with A/B tests, landing pages, viral factor, email deliverability, and Open Graph. On top of this, they layer the discipline of direct marketing, with its emphasis on quantitative measurement, scenario modeling via spreadsheets, and a lot of database queries. If a startup is pre-product/market fit, growth hackers can make sure virality is embedded at the core of a product. After product/market fit, they can help run up the score on what’s already working.

This isn’t just a single role – the entire marketing team is being disrupted. Airbnb, a case studyLet’s use case of Airbnb to illustrate this mindset. Looks simple, right? How Does Airbnb Automatically Post on Craigslist? Craigslist. Fri Jan 23 22:13:39 2015 UTC Overview: craigslist provides a bulk posting interface for easy submission of multiple new posts in a single request to our server.

craigslist

This document describes that interface and how it is used. New postings are submitted to the bulk interface in RSS format with additional craigslist-specific elements via HTTPS POST. craigslist returns another RSS document detailing which postings are valid or which have been posted. A description of the protocol, RSS submission and response formats, and a reference of acceptable values is below. Protocol: Database normalization. Normalization involves refactoring a table into smaller (and less redundant) tables but without losing information; defining foreign keys in the old table referencing the primary keys of the new ones.

Database normalization

The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys. Edgar F. Codd, the inventor of the relational model (RM), introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970.[1] Codd went on to define the Second normal form (2NF) and Third normal form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[3] Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form.[4] Most 3NF tables are free of insertion, update, and deletion anomalies. Objectives[edit] 1. An update anomaly.