December 21, 2008 17:00
What is a sitemap?
From sitemaps.org,
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
What is the basic format of the XML file?
Also from sitemaps.org, we have:
1: <?xml version="1.0" encoding="UTF-8"?>
2: <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
3: <url>
4: <loc>http://www.example.com/</loc>
5: <lastmod>2005-01-01</lastmod>
6: <changefreq>monthly</changefreq>
7: <priority>0.8</priority>
8: </url>
9: </urlset>
Only <urlset>, <url>, and <loc> are required. The other attributes under <url> are optional.
Here's a snippet from itscodingtime's sitemap:
1: <urlset>
2: <url>
3: <loc>http://www.itscodingtime.com/itscodingtime/post/URL-Regular-Expression-Validation.aspx</loc>
4: <lastmod>2008-12-21</lastmod>
5: <changefreq>monthly</changefreq>
6: </url>
7: <url>
8: <loc>http://www.itscodingtime.com/itscodingtime/post/WLW-and-BlogEngineNET-Cant-view-Properties.aspx</loc>
9: <lastmod>2008-12-14</lastmod>
10: <changefreq>monthly</changefreq>
11: </url>
12: ...
13: </urlset>
My content changes all the time. How do I keep my sitemap up to date?
Hopefully, if you're running a blog you're also using a good blogging platform like BlogEngine.NET. Because content on a blog does change quite often, your best option is an httpHandler like the one built into BlogEngine.NET.
Specifically, I'm talking about this line from your site's Web.config:
1: <add verb="*" path="sitemap.axd" type="BlogEngine.Core.Web.HttpHandlers.SiteMap, BlogEngine.Core" validate="false"/>
The sitemap.axd handler will dynamically generate your sitemap when requested so that when Google's search bot stops in for a visit it gets a comprehensive listing of all of your content.
How do I tell the world about my sitemap?
It's easy. You should have a robots.txt file in your site's root folder. You just need to add the URL of your sitemap httpHandler. Google and other search engines will look for the robots.txt file and, when they find your sitemap entry, they'll follow it and add your content to their indexes.
For reference, here's what my robots.txt looks like:
1: User-agent: *
2: Disallow: /login.aspx
3: Disallow: /search.aspx
4: Disallow: /error404.aspx
5: Disallow: /archive.aspx
6:
7: #Remove the '#' character below and replace example.com with your own website address.
8: sitemap: http://itscodingtime.com/sitemap.axd
Conclusion
That's a quick overview of sitemaps and how they're used. Hope you found some value in it. A future post will go over how to inform Google of your sitemap rather than waiting for them to find you.
Sources:
- Google Sitemap and BlogEngine.NET
- sitemaps.org
- MSDN: <httpHandlers> Element
- HTTP Handlers and HTTP Modules in ASP.NET