background preloader

Node.js

Facebook Twitter

MatthewMueller/cheerio. Screen Scraping with Node.js. You may have used NodeJS as a web server, but did you know that you can also use it for web scraping? In this tutorial, we'll review how to scrape static web pages - and those pesky ones with dynamic content - with the help of NodeJS and a few helpful NPM modules. Web scraping has always had a negative connotation in the world of web development - and for good reason. In modern development, APIs are present for most popular services and they should be used to retrieve data rather than scraping. The inherent problem with scraping is that it relies on the visual structure of the page being scraped. Whenever that HTML changes - no matter how small the change may be - it can completely break your code.

Despite these flaws, it's important to learn a bit about web scraping and some of the tools available to help with this task. Note: If you can't get the information you require through an API or a feed, it's a good sign that the owner does not want that information to be accessible. Data mining local radio with Node.js. More harpsicord?! Seattle is lucky to have KINGFM, a local radio station dedicated to 100% classical music. As one of the few existent classical music fans in his twenties, I listen often enough. Over the past few years, I've noticed that when I tune to the station, I always seem to hear the plinky sound of a harpsicord.

Before I sent KINGFM an email, admonishing them for playing so much of an instrument I dislike, I wanted to investigate whether my ears were deceiving me. Perhaps my own distaste for the harpsicord increased its impact in my memory. This article outlines the details of this investigation and especially the process of collecting the data.

If it ain't baroque... A harpsicord is in many ways similar to the piano. The harpsicord can sound tinny to modern ears. At the start of the 16th century, the newly invented fortepiano began to push both the harpsicord and its close relative, the clavicord out of favor. These eras are: One exception is opera. Collecting the data Cheerio. V8 javascript VM and Node.js memory management options | O sNAp. Memory management behavior is one of the first topics I wanted to understand in node. This will be part one of two articles in which I intend to explore: Memory management / gc options in the V8 VM that runs node.js applications. Debugging / memory leak analysis for running node servers.

At a high level V8 uses a generational memory model with a copy collector and incremental mark and sweep. You’ve got control over the size of three different memory spaces, new, old, and code (Although the old generation is further split into the map space (V8′s hidden class construct), the large object space, the cell space, the old data space, and the old pointer space) . Configuring V8 heap sizes Out of memory errors? --max_new_space_size (in kBytes) Control the size of the new generation. --max_old_space_size (in Mbytes) Control the size of the old generation. --max_executable_size (in Mbytes) The code space size. Controlling when GCs occur in V8 --gc_global --gc_interval –-nouse-idle-notification --expose-gc. Dominictarr/JSONStream.

Style guide. Opinions are like assholes, every one has got one. This one is mine. Punctuation: who cares? Punctuation is a bikeshed. Put your semicolons, whitespace, and commas where you like them. This post is concerned with higher-order style. Be obvious Don't do something complex just to make your api simpler. Example, avoid chaining DSLs. This is bad: thing.when('something').then(doThing) It's not really obvious how when relates to dothing.

Chaining where you simply return this is acceptable. Be idiomatic, or not. if possible, make your code follow the APIs in node core. If you don't do this, you need documentation. If your can't follow an idiomatic API precisely, do something completely different. ALWAYS PASS ERR IN CALLBACK, (an event listener is not a callback, so this doesn't apply in that case) If you have a function called createServer, it should return a server, and it should have a listen function. createServer should never start the server listening. "all you need is lambdas" -- John Lennon. Botsikas' Blog: Node.js modules cross platform compilation using gyp. Update: I have made a pull request where you can find the updated tools discussed in this article, located here Node.js has been using waf (node-waf) to configure and build modules up to version 0.4. From v0.6 and on, the team has moved on to gyp (Generate Your Projects) which seems to be a bit more promising when it comes to cross platform compilation.

This post shows how to create a simply gyp file to build your own custom native node.js modules and provides some scripts to automate the project generation process. A bit of history Gyp is a google project that was created to support cross platform building of the opensource chromium project. The main target of this project is to “generate native Visual Studio, Xcode and SCons and/or make build files from a platform-independent input format”. The project is still on its very first steps and there is little documentation on how to use it.

Node-waf vs gyp Node Module’s gyp file but this didn’t work on my tests. The node-gyp scripts. Alexander Luksidadi's Blog » ExpressJS without Jade? Use Underscore template! Many of you must have felt like a burden knowing that Express recommended you to learn another template language (Jade). Don’t worry, you can code all your templates on HTML using underscoreJS! Oh yay? Let’s take a look on how you implement that on your express app. First install express package, create your express app: $ npm install -g express $ express . Install your underscore package $ npm install -d underscore If you edit Now, all you need to do is, to comment out 1 line and register underscorejs: Now, go to routesindex.js : $ vi routes/index.js Change the template name from ‘index’ to ‘index.html’: Next, go to views directory and create layout.html And last, still in views directory, create another file called index.html And there you go.. you can write your HTMl code in peace =)

Things I wish I knew about MongoDB a year ago. I’ve used MongoDB for over a year at scale at both Heyzap and Bugsnag and I’ve found it to be a very capable database. As with all databases, there are some gotchas, and here is a summary of the things I wish someone had told me earlier. Selective counts are slow even if indexed For example, when paginating a users feed of activity, you might see something like, In MongoDB this count can take orders of magnitude longer than you would expect. There is an open ticket and is currently slated for 2.4, so here’s hoping they’ll get it out. Until then you are left aggregating the data yourself. Inconsistent reads in replica sets When you start using replica sets to distribute your reads across a cluster, you can get yourself in a whole world of trouble.

This is compounded if you have performance issues that cause the replication lag between a primary and its secondaries to increase to minutes or even hours in some cases. Range queries are indexed differently Mongo’s BSON ID is awesome Profiler. Why You Should Pay Attention to Node.Js. Projects, Applications, and Companies Using Node · joyent/node Wiki. Moshen/node-googlemaps. Vows « Asynchronous BDD for Node. NodeCloud - Node.js resources. Calmh/node-snmp-native. LearnBoost/engine.io. BinaryJS - Realtime binary streaming for the web using websockets. Everyauth. c9/architect. Express - node web framework. Blazing fast node.js: 10 performance tips from LinkedIn Mobile. In a previous post, we discussed how we test LinkedIn's mobile stack, including our Node.js mobile server. Today, we’ll tell you how we make this mobile server fast.

Here are our top 10 performance takeaways for working with Node.js: 1. Avoid synchronous code By design, Node.js is single threaded. To allow a single thread to handle many concurrent requests, you can never allow the thread to wait on a blocking, synchronous, or long running operation. A distinguishing feature of Node.js is that it was designed and implemented from top to bottom to be asynchronous. Unfortunately, it is still possible to make synchronous/blocking calls. Our initial logging implementation accidentally included a synchronous call to write to disc. 2. The Node.js http client automatically uses socket pooling: by default, this limits you to 5 sockets per host. 3. For static assets, such as CSS and images, use a standard webserver instead of Node.js. 4. 5. 6. 7. 8. 9. 10. Try it out. The Node Beginner Book » A comprehensive Node.js tutorial. Develop a RESTful API Using Node.js With Express and Mongoose...