Google Sitemaps - What, Why and How
By David Cusimano
A Google Sitemap ("GS" for short) enable Google's GoogleBot spider to easily know what to index on your website. It is basically a text file that lists the web addresses of all the pages on your website. When Google's GoogleBot spider reads this list, it then knows about all those webpages that are specified in the GS. Two different formats are supported: text sitemaps and XML sitemaps. Both formats contain the addresses of all the webpages on your website. The XML version contains additional information about each webpage such as its last modification date and approximately how often it is updated.
How Does a Google Sitemap Help Me?
In the absence of a GS file, the GoogleBot spider downloads a webpage from your website and scans through the GS looking for any links that it contains to other webpages in your website. Google's GoogleBot spider then downloads all those newly found pages and repeats the process of scanning for links. It takes a lot of time to download and scan through pages. If you have a GS, the GoogleBot spider immediately knows about all the webpages on your website. Reading it is considerably faster than having to download and scan each page. A GS also helps if your webpages are not well linked together or not at all. In that case, without a GS, it may take a while for some webpages to be discovered or discovered at all. When you have a GS, that problem is eliminated.
Does Google.com Index Everything?
No, Google may not index everything that you specify. Google.com states, "we can't guarantee that URLs from your Sitemap will be added to the Google index." Even though Google.com does not guarantee that it will index everything that you specify in your GS, a GS should increase the opportunity that your webpages will be indexed sooner since Google will know about them sooner. If Google does not know about your webpages, they may not get indexed.
If I Create a Google Sitemap, Will It Hurt Me?
No, a GS will not hurt you. Google.com states, "In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it." Google.com uses the information contained in your GS to learn about the structure of your website and to better schedule its search engine spider in the scanning (a.k.a. crawling or spidering) of your website.
How Do I Generate a Google Sitemap?
There are several tools available that you can use to create a GS. Google.com itself even provides a sitemap generator written in the Python programming language. There are also websites where you type in your website address and its spider goes and scans your website to determine all your webpages; however, such scanning is time consuming since every page on your website must be scanned, and the process must be initiated by you. If you want to make the process run faster or automate it, then use generator software locally on your website.
How Can I Automate Google Sitemap Generation?
Creating a GS can be an automated process. The simplest way is to install and use the sitemap.pl generator software. Once you install this software in your cgi-bin directory, the software will automatically generate the GS file each time the GS file is accessed. This software is of the type whereby you can "set it and forget it". You can go about adding to your website and you do not have to worry about updating your GS. The software works by scanning your website's hard drive looking for files to include in your GS. Because the hard drive is accessed directly, this software very quickly generates the GS. The sitemap.pl software runs very fast -- it has been clocked at finding over 500 webpages per second -- that's fast for any generator software.
How Do I Tell Google.com About My Google Sitemap?
Once you have have a GS, you need to tell Google.com about it. The first method is the simplest and the quickest to do. In your robots.txt file, include a line that says "sitemap:" followed by the website address of your GS file. For example, if the GS of the DOMAIN.com website is located at http://www.DOMAIN.com/sitemap.xml then its robots.txt file should contain a line that states, "sitemap: http://www.DOMAIN.com/sitemap.xml". The second method involves logging into Google.com Webmaster Tools at google.com/sitemaps and adding your site to the Sites Dashboard, and then submitting the webpage address of your GS file. Once you add your site, click the "Verify" link and follow the instructions and you will gain access to additional statistics about your website and status information about the processing of your GS file.
In Summary, What Do I Need to Do?
1. Use GS generator software.
2. Update your robots.txt file.
3. Add your site to Google Webmaster Tools.
4. Submit the address of your GS file.
A good first step in getting your webpages indexed is to have a GS. And with automated GS generator software, it improves the possibility of your webpages being indexed and showing up in search engine results.
Article Source: http://ezinearticles.com/?expert=David_Cusimano
http://EzineArticles.com/?Google-Sitemaps---What,-Why-and-How&id=1448725