In my last post on “How Search Engines Work“, I covered the difference between Googlebot crawling and indexing your website and why it matters for SEO. If you’re familiar with that post or with the general workings search bots in general, you will already know that the bots need to be notified about accessible pathways to your website. This may be via links, sitemaps, robot.txt files and so on. So how do you know which of your pages are being indexed on Google? This beginners post will cover the best general steps that you can take if your website is new.
Ways of Getting Indexed by Google
I will go more into depth into two of the main techniques discussed in “How Search Engines Work” to ensuring that your new page gets indexed for search.
Breaking it down, it really is what it says it is – a map to your site. However, this is a map of links – how Googlebot loves to traverse the web. It is the most direct way of alerting Google to pages that they may not otherwise discover. There are many tools available online to generate a sitemap for you, such as XML Sitemaps, that offer basic to advanced attributes such as calculated priority.
It is important to note that when you are using these tools, to set the change frequency attribute to a realistic value. You may have product pages that ‘never‘ change, but your main blog page changes ‘daily‘. By setting the correct frequency with Google, the search bots will crawl your sites more quickly by spending less time on crawling known content and focusing on detecting your new content instead.
Often, generated sitemaps may require additional editing. To learn more about sitemap attributes and how to use them, I recommend having a read about XML tag definitions. Once generated and (optional) editing of your sitemap is complete, upload the .xml file to your website’s root domain. From here, all you need to do is login to your Google Webmaster account and submit the sitemaps URL:
As well, many hosting, e-commerce platforms have the ability to automatically generate sitemaps when a new page or product page has been created. This is particularly handy for large e-commerce sites with thousands of products being listed or edited each day.
This is another direct way in telling Google about new content sections on your site, as well as notifying the search bots on parts of your site that you don’t want accessed. By using protocols to give instructions to web robots, you can disallow robots from accessing content that may be irrelevant or outdated.
As this file is uploaded publicly to your root domain, I strongly advise that you do not use robots.txt to block pages with sensitive information or pages that you want to hide from the public. Sensitive and private content should already be behind safeguards that your site has, such as members only login areas.
To find out more about how to create a robots.txt file, visit http://www.robotstxt.org/robotstxt.html
How to Tell if Your Site Has Been Indexed
Popular websites will notice that their pages are indexed within minutes, however – if your website is new or your website is updated on a casual basis, you may find the rate of indexing to take from within a few hours to a few days.
One easy way to check if your site has been indexed is to perform a site search in Google:
Simply type in ‘site‘ and your domain address without the ‘www‘, separated by a colon (no spaces). When your results show up, it will give you an estimate on how many pages from your site have been indexed.
If you are looking to see whether a specific page on your website has been indexed, perform a site search as above and include the page title of the page you want to check:
Overall, there are many techniques to both get your site indexed – but it really is a non-issue for most webmasters as most sites will now automatically ping the search engines once new content has been updated (as is the case with WordPress run sites). It is however, helpful in the detection of crawl issues your site may have, so it is something definitely worth knowing.