background preloader


Facebook Twitter

Use Node.js to Extract Data from the Web for Fun and Profit. Need to automate pulling some data from a web page?

Use Node.js to Extract Data from the Web for Fun and Profit

Or want to mash up some unstructured data from a blog post with another data source. No API for getting at the data… ! @#$@#$… No Problem… Web scraping to the rescue. What is web scraping? … you may be asking… Web scraping consists of programmatically (typically with no browser involved) retrieving the contents of web pages and extracting data from them. In this article, I’m going to show you a pretty powerful set of tools for scraping web content quickly and easily using Javascript and Node.js.

I recently needed to be able to crawl a (modestly) large number of pages and sift through them looking for patterns. I have a guilty pleasure. If you don’t already have Node.js installed or haven’t updated it in a while. Npm install cheerio Once Cheerio has been installed, we can get down to business. Node download.js Lets look at the code for this example inline. This will download the contents of the specified URL and print it to the console. An Absolute Beginner's Guide to Node.js.

There's no shortage of Node.js tutorials out there, but most of them cover specific use cases or topics that only apply when you've already got Node up and running.

An Absolute Beginner's Guide to Node.js

I see comments every once and awhile that sound something like, "I've downloaded Node, now what? " This tutorial answers that question and explains how to get started from the very beginning. What is Node.js? A lot of the confusion for newcomers to Node is misunderstanding exactly what it is. The description on definitely doesn't help. An important thing to realize is that Node is not a webserver. Installing Node Node.js is very easy to install. I've Installed Node, now what? Once installed you'll have access to a new command called "node" . $ node > console.log('Hello World'); Hello World undefined In the above example I typed "console.log('Hello World')" into the shell and hit enter.

The other way to run Node is by providing it a JavaScript file to execute. Hello.js console.log('Hello World'); Integrating Node.js with PHP. Node.js is a server-side solution for building applications in JavaScript which, in the few years it has been around, has already become quite popular.

Integrating Node.js with PHP

It is built on Chrome’s V8 JavaScript runtime, and it is especially well suited to building real-time websites with push capabilities thanks to its event-driven architecture whereby I/O calls are asynchronous. This article aims to show you how you can start using Node to add real-time features to your PHP-based website. First, we shall look a bit more at what makes Node a good fit for real-time apps, before going on to demonstrate how to build a real-time news feed and incorporate it into your PHP website. Thread-based vs Event-based Traditionally PHP is served with Apache and the mod_php module. In Node, a single Node process typically serves every client in an event loop. In essence, Node can be viewed as a similar environment for building applications such as Python’s Twisted or EventMachine in Ruby. Why Should I Use Node.js Then?