mod_rewrite is a powerful Apache module that gives you the power over the URL’s displayed to your visitors to your site.
Much has been written already by many people using both examples and tutorials of the basics. The aim of my addition, as always is primarily for my own notes and secondarily to explain why you want to use mod_rewrite for SEO ranking retention.
Let’s say you have a website that enjoys well ranked pages and a lot of traffic via google or has a great deal of Backlinks from other blogs or websites to particular pages in your site. One of the powerful aspects of Google Webmaster Tools is that it easily provides you a way to be able to view which URL’s are linking particular pages on your own website. When contemplating moving a website or rearranging the structure of an existing website, one of your first considerations should be to maintain the availability of accessing your new pages via your old links. Essentially we map your current (old) structure to your new (proposed) structure.
Here’s our first sample. The old url was http://www.domain.com.au/mstore/2wsub1 and our new URL includes keyword rich content: http://www.domain.com.au/htc-hero-android-brown. We use the following mapping:
RewriteRule ^mstore/2wsub1(.*)$ http://www.domain.com.au/htc-hero-android-brown [R=301,NC,L]
The above means: Any content being requested at /mstore/2wsub1 or /mstore/2wsub1.html or /mstore/2wsub1.asp etc. will be redirected to http://www.domain.com.au/htc-hero-android-brown. The redirect additionally offers the following further information about the redirect.[R=301,NC,L] tells the requester that it is a permanent redirect (R=301 – Permanent Redirect) and that the requested URL match is case-insensitive (NC – no-case) and that if it finds a match, then redirect and do not attempt any further matches [L – Last match]
After migrating a website, I advise to monitor both Google Webmaster Tools and also your server site statistics / logs for 404 Errors, so you can correct any URL’s you hadn’t originally mapped.
Migrating domain names & ensuring a single accessible URL
In this scenario, we are moving http://www.olddomain.com.au/ to http://www.coolnewdomain.com.au/ because your audience are Mac users and like anything with the title “cool” in it ;). As an aside, we want to ensure that people are always directed to http://www.coolnewdomain.com.au if they type in http://coolnewdomain.com.au and search engines only index the www version so that you do not have duplicate content listed.
RewriteCond %{HTTP_HOST} !^www\.coolnewdomain\.com\.au$
RewriteRule (.*) http://www.coolnewdomain.com.au/$1 [R=301,NC]
The above means: If you have NOT accessed the website using http://www.coolnewdomain.org.au then redirect you to http://www.coolnewdomain.org.au . The $1 value is the ‘first variable’. What it means is that if you access: http://olddomain.com.au/cool-product will be redirected to http://www.coolnewdomain.com.au/cool-new-product. Everything after the domain name is used as a variable to attach to the end of the new domain name. Even if you have not migrated domain names it is still wise to use this code on your existing site so that search engines only search one ‘form’ of your domain name. One last thing to note here, is that we don’t use the [L] flag, allowing mod_rewrite to continue looking for matches, as there’s still a chance it could find a match for url’s like our first example.
Using mod_rewrite for blocking access to a website as an alternative to Basic Authentication
Real life usage scenario. Some developers I know created a development website similar to http://dev.coolnewsite.com.au/ that was accessible publicly without authentication. They assumed as they hadn’t advertised it, nobody knew about it but just prior to going live, google ‘found’ the site and indexed it. When the real site went live, anytime they were searching in google for the site or for products, the http://dev.coolnewsite.com.au/ was being displayed. They wanted to block access to the dev site but they didn’t want to lose the fact that they already had some visitors and page rank on the dev site. I tried to use a mod_rewrite rule as above before the Authentication rule but the auth rule takes effect before the rewrite rule,so visitors were being asked to authenticate and not being redirected, additionally this wouldn’t have told google about the new, actual live site. So here’s what I did:
RewriteCond %{REMOTE_ADDR} !^213\.206\.175\.212$
RewriteCond %{REMOTE_ADDR} !^124\.231\.17\.180$
RewriteCond %{REMOTE_ADDR} !^123\.168\.239\.32$
RewriteRule ^(.*)$ http://www.coolnewdomain.com.au/$1 [R=301,L]
ps: fictitous IP addresses provided ;)
The above means: If your IP address is NOT 213.206.175.212 (dev A) or 124.231.17.180 (dev B) or 123.168.239.32 (Customer) then redirect the visitor to http://www.coolnewdomain.com.au/ with a permanent redirect [R=301] and make it the last matching rule [L].
This means search engines will know about the real, live site next time they come to visit, anyone searching for the site finding the dev site will be redirected to the equivalent URL on the live site but the developers and customer can still access the dev site from the three different locations. The obvious downside to the above code is that if you are not using a static IP address, you will need to update the addresses each time your IP changes. We will add authentication as our live site begins to take rankings above the dev site.
Technorati Tags: mod_rewrite, apache, lamp, web development