background preloader

The Web Robots Pages

The Web Robots Pages
In a nutshell Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol. It works likes this: a robot wants to vists a Web site URL, say Before it does so, it firsts checks for and finds: User-agent: * Disallow: / The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site. There are two important considerations when using /robots.txt: robots can ignore your /robots.txt. So don't try to use /robots.txt to hide information. See also: The details The /robots.txt is a de-facto standard, and is not owned by any standards body. In addition there are external resources: The /robots.txt standard is not actively developed. The rest of this page gives an overview of how to use /robots.txt on your server, with some simple recipes. How to create a /robots.txt file Where to put it The short answer: in the top-level directory of your web server.

http://www.robotstxt.org/robotstxt.html

Related:  efroimrosenbergSEO Webmaster Developer Tools

Introduction to Load Balancing Using Node.js - Part 1 by Ross Johnson Introduction At Mazira, a lot of what we develop takes the form of web services. While most of these are only used internally, it is still important that they are high-performance and resilient. These services need to be ready to churn through hundreds of gigabytes of documents at a moment’s notice, say, for example, if we need to reprocess one of our document clusters.

How to Look at Your Website the Way Google Does When you spend months or years on a website, not to mention thousands of dollars, it’s hard to step back and look at it objectively. Can you look at it through the eyes of your users? Can you look at it the way Google does? If you can look at your website the way Google does, you’ll probably discover areas in which your website needs work. So in that spirit, I’m going to teach you how you can see your website from Google’s perspective, and how you can then target the areas that need improvement. K Means Clustering with Tf-idf Weights Unsupervised learning algorithms in machine learning impose structure on unlabeled datasets. In Prof. Andrew Ng's inaugural ml-class from the pre-Coursera days, the first unsupervised learning algorithm introduced was k-means, which I implemented in Octave for programming exercise 7.

Parallel Processing on the Pi (Bramble) Parallel processing on the Raspberry Pi is possible, thanks to the ultra portable MPICH2 (Message Passing Interface). I was keen to try this out as soon as I managed to get hold of two of these brilliant little computers (yes I'm a lucky boy). Here I'm going to show how I managed to get it all working and will display the results :)(Bramble was a name an ingenious Raspberry Pi forum member made up, not myself!) There are three ways which you can install MPICH2 (in case one doesn't seem to work for you), compiling and installing from source, my .deb package then following the rest of the tutorial, or the Python script file. Installing from source takes a while on the little Pi when not cross compiling.

Create a robots.txt file - Search Console Help In order to make a robots.txt file, you need access to the root of your domain. If you're unsure about how to access the root, you can contact your web hosting service provider. Also, if you know you can't access to the root of the domain, you can use alternative blocking methods, such as password-protecting the files on your server, and inserting meta tags into your HTML.

scrollorama Disclaimer: This is an experimental, just-for-fun sort of project and hasn’t been thoroughly tested. Design and build your site, dividing your content into blocks. Embed scrollorama.js after jQuery and initialize the plugin, passing the blocks class selector as a parameter. Target an element and animate its properties. The animation parameters you can use are: Hook into the onBlockChange event. Introduction to Information Retrieval This is the companion website for the following book. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Raspberry Pi Weather Station for schools When I first joined the Raspberry Pi Foundation, over a year ago now, one of my first assignments was to build a weather station around the Raspberry Pi. Thanks to our friends at Oracle (the large US database company), the Foundation received a grant not only to design and build a Raspberry Pi weather station for schools, but also to put together a whole education programme to go with it. Oracle were keen to support a programme where kids get the opportunity to partake in cross-curricular computing and science projects that cover everything from embedded IoT, through networking protocols and databases, to big data. The goals of the project was ambitious. Between us we wanted to create a weather experiment where schools could gather and access weather data from over 1000 weather stations from around the globe.

Using Noindex, Nofollow HTML Metatags: How to Tell Google Not to Index a Page in Search Indexing as many pages on your website as possible can be very tempting for marketers who are trying to boost their search engine authority. But, while it’s true that publishing more pages that are relevant for a particular keyword (assuming they’re also high quality) will improve your ranking for that keyword, sometimes there’s actually more value in keeping certain pages on your website out of a search engine’s index. Download our free SEO ebook here for more search engine optimization tips from experts. ... Say what?! Stay with us, folks.

CSS3 transitions, transforms and animations Often used as part of an image gallery or to show additional information, again this can be done in javascript by gradually changing the padding of elements. This often looks choppy on mobile devices, and frames can be missed if the animation is quick. CSS transitions plus transforms help out to make this a simple effect to create. Have a look at a more complete example on the demos page. simple web crawler / scraper tutorial using requests module in python Let me show you how to use the Requests python module to write a simple web crawler / scraper. So, lets define our problem first. In this page: I am publishing some programming problems. So, now I shall write a script to get the links (url) of the problems.

Related: