background preloader

SEO & .htaccess

Facebook Twitter

Easy PHP Blackhole Trap with WHOIS Lookup for Bad Bots. One of my favorite security measures here at Perishable Press is the site’s virtual Blackhole trap for bad bots.

Easy PHP Blackhole Trap with WHOIS Lookup for Bad Bots

The concept is simple: include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site. I call it the “one-strike” rule: bots have one chance to follow the robots.txt protocol, check the site’s robots.txt file, and obey its directives.

Failure to comply results in immediate banishment. In five easy steps, you can set up your own Blackhole to trap bad bots and protect your site from evil scripts, bandwidth thieves, content scrapers, spammers, and other malicious behavior. The Blackhole is built with PHP, and uses a bit of .htaccess to protect the blackhole directory. Installation Overview. Stupid htaccess Tricks. Welcome to Perishable Press!

Stupid htaccess Tricks

This article, Stupid htaccess Tricks, covers just about every htaccess “trick” in the book, and is easily the site’s most popular offering. In addition to this htaccess article, you may also want to explore the rapidly expanding htaccess tag archive. Along with all things htaccess, Perishable Press also focuses on (X)HTML, CSS, PHP, JavaScript, security, and just about every other aspect of web design, blogging, and online success. If these topics are of interest to you, I encourage you to subscribe to Perishable Press for a periodic dose of online enlightenment ;) General Information [ ^ ] .htaccess Definition 1 ^ Apache server software provides distributed (i.e., directory-level) configuration via Hypertext Access files. Commenting .htaccess Code ^ Comments are essential to maintaining control over any involved portion of code. The htaccess Rules for all WordPress Permalinks. Update 2012/07/15 all code updated with the new .htaccess rules (changed in WP 3.0).

The htaccess Rules for all WordPress Permalinks

The code in this article should work with all versions of WordPress. </update> I recently performed a series of tests on a fresh installation of WordPress 2.8.6 to determine the exact htaccess rewrite rules that WordPress writes to its htaccess file for various permalink configurations. Under the WP admin option menu, WordPress lists four choices for permalink structure: Default: and name based: /%year%/%monthnum%/%day%/%postname%/ The "default" option is to not use permalinks.

For the test, we began with the common "date and name based" permalink configuration. The results indicate conclusively that WordPress uses the exact same set of htaccess rules for all permalink configurations. Without further ado, the htaccess rules for all WordPress permalinks1 are precisely either #1 or #2: [ #1 ] If WordPress installed in the root directory » [ #2 ] If WordPress installed in a subdirectory called "foo" »

Better Robots.txt Rules for WordPress. Cleaning up my files during the recent redesign, I realized that several years had somehow passed since the last time I even looked at the site’s robots.txt file.

Better Robots.txt Rules for WordPress

I guess that’s a good thing, but with all of the changes to site structure and content, it was time again for a delightful romp through robots.txt. Robots.txt in 30 seconds Primarily, robots directives disallow obedient spiders access to specified parts of your site. They can also explicitly “allow” access to specific files and directories. So basically they’re used to let Google, Bing et al know where they can go when visiting your site.

Robots.txt and WordPress Running WordPress, you want search engines to crawl and index your posts and pages, but not your core WP files and directories. User-agent: * Disallow: /feed/ Disallow: /trackback/ Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /wp-includes/ Disallow: /xmlrpc.php Disallow: /wp- Allow: /wp-content/uploads/ Sitemap: