8 Sitemap SEO Best Practices for Large Websites

Are sitemaps important for SEO? I get that question all the time. Sometimes, websites can go a long way without those details that we, SEO consultants,  consider fundamental for a well-ranked site. I’ve even had to manage domains with no Robots.txt, something that makes me shiver every time I think about it. But sooner or later, these mistakes or omissions catch up with your website and become a disadvantage.

So, what is a sitemap? Well, basically, a sitemap works as an index for Search Engines to know which pages it should index or not index. In other words, it’s a set of instructions that Google Bots will read to understand the best way to navigate your site. Therefore, it’s a vital part of your technical SEO strategy.

In this article, we’ll share 8 tips you should have in mind to craft a proper SEO sitemap. But first, there’s something we need to clarify.

What Kind of Sitemap Do Websites Need? HTML Sitemaps vs XML Sitemaps

 

Side-by-side comparison of HTML sitemaps vs. XML sitemaps

 

In your research, you may come across two different types of sitemaps: HTML sitemaps and XML sitemaps. 

One could say that both serve the same purpose: to inform Google which pages it should index, those that are vital to your site’s organic traffic.

You may recognize SEO HTML sitemaps as, simply put, footers. For example, this is our HTML sitemap at SEORadar:

 

SEORadar's HTML sitemap

 

Pretty simple and intuitive, right? 

An XML sitemap, on the other hand, is either a spreadsheet or a block of code. It’s not that useful for UX because it’s neither intuitive nor hyperlinked. That’s because XML sitemaps are for crawlers, not for users

 

SEORadar's sitemap, generated by Rankmath and shown as a spreadsheet.

 

The main difference between HTML sitemaps and XML sitemaps is that the former are mostly designed for users, and the latter are mostly designed for crawler robots.

The HTML sitemap in our footer is there so users, like you, get a comprehensive glance at everything we have to offer. It’s divided into four columns, so you can scan the content and differentiate what’s relevant to you from what’s not. An XML sitemap, on the other hand, is not particularly user-friendly, but it’s great for Google bots. 

With this differentiation out of the way, let’s move forward.

8 Key Sitemap SEO Best Practices

There are certain protocols and standards you should have in mind when creating your XML sitemaps in 2021. In this section, we’ll share them in the form of 8 actionable tips. 

We suggest you:

  • Use a <lastmod> attribute
  • Link your robots.txt and your sitemap
  • Only include canonical URLs
  • Keep your sitemap light
  • Create several sitemaps, if necessary
  • Automate your sitemap creation & updates
  • Submit your sitemap to Google Search Console
  • Use both an XML sitemap and an HTML sitemap

 

Use a <lastmod> attribute

The most important attribute your sitemap must have is <lastmod>. The <lastmod> attribute includes the date when a certain page was last updated. 

This attribute, along with its corresponding URL, could be considered the only important piece of information Googlebots take into account. This was even stated by John Muller, Search Advocate at Google:

 

Link your Robots.txt and your Sitemap

Sitemaps work as an extension of the Robots.txt file, determining which pages you actually want to index. Your Robots.txt should include a link to your sitemap; at the same time, your sitemap should have a link that directs crawlers to your Robots.txt. This will make it easier for crawlers to understand what to index and what to avoid indexing.

Only include canonical URLs

All the links you include in your sitemap should be canonical URLs, with no errors. In other words, they should all have 200s status codes. 

Keep it light

XML files are extremely easy to read for crawlers due to their small file size. 

Keep in mind that your sitemap shouldn’t exceed 50 megabytes.

Create several sitemaps, when necessary

If you’re creating a sitemap for a large domain, you may start to worry about the sheer size of that single file, containing every index-worthy URL in your site. Surprisingly, your site doesn’t need to have only one sitemap. Actually, most large sites, such as big e-commerce platforms, require more than one sitemap.

So, how should you map out your site? What if I told you you could have a sitemap sitemap?

 

 

Yes, you can and should have a sitemap that works as an index for the rest of your sitemaps. 

So, how should you create this “index sitemap”? 

As with everything in SEO, the best way to proceed is by segmenting. 

Imagine you are working for a client who has a hardware business, selling all kinds of hammers, screwdrivers, saws, etcetera. Generally, these kinds of accounts have massive e-commerce sites with lots of intricate technical SEO issues. Some of those problems are generally related to how Google chooses what to index.

It’s a known fact by now: Google doesn’t really follow a strict set of rules. Sometimes, it can choose to completely ignore the suggestions made explicit on your website’s HTML code. For example, it has been found that Google rewrites meta descriptions over 70% of the time

 

70% of pages have their metadata rewritten by Google. Source: Search Engine Journal

 

Nonetheless, some parts of your site can directly impact the way you get indexed. For example, your Robots.txt will tell Google crawlers if they can index your site, or if it’s worth crawling in the first place. 

To sum it up: what’s important is how you present your domain to the crawler. If you give it a sitemap that includes different sitemaps segmented by categories or subfolders, what you are actually doing is organizing what you want the crawler to perceive in a clear and extremely linear way. 

For example, our hardware client could have three subfolders in their domain, each with its own subcategories and product pages: Hammers, Screwdrivers, and Saws. You could create a sitemap XML file for each of these categories. In your Saws subfolder sitemap, you could logically include URLs to subcategories: Hole Saws, Reciprocating Saws, Table Saws, etcetera. Each of these subcategories would include its product pages.

For pages with a heavy volume of videos, images, and other types of media, you could even create a media sitemap with the most important content on the site. Alternatively, you could also add those important pieces of content to your domain sitemap, if you think it would benefit your site, although it won’t be as effective as having a separate sitemap.

Automate the process

Sitemaps are pretty easy to create and upload to your root folder. But, as with many other vital components of your SEO strategy, it needs constant updating.

In fact, a regularly updated sitemap is a great sign of a healthy domain. 

There are many tools you could use to create and update your sitemap automatically. Some content management systems (CMS) automatically create a sitemap for your blog, while others, such as WordPress, allow you to install plugins to create it on your own. Amongst these plugins, there’s the growingly powerful Rankmath

If no tool convinces you, of course, you could always just manually create your sitemap on a spreadsheet, saving it as an XML file. 

Always submit your sitemap to Google Search Console

Google Search Console (GSC) is as impactful as it is underrated. I’ve known lots of SEO specialists who just use Google Analytics: well, they’re missing out. 

It’s true that Google Analytics offers a variety of tools that GSC doesn’t, mostly those that allow you to manage your SEM and all things paid traffic. Search Console, on the other hand, is the ideal SEO management tool. It provides specific reports with a focus on organic traffic, and it also allows you to set up how Google will crawl your site.

To get it done, you can upload your sitemap directly to Search Console. This is a great way to make sure your Sitemap gets through to Google. 

But be careful! Remember Google always has the last call on whether to index a page or not, even if you’ve added that URL to your XML Sitemap. Still, it’s a great XML Sitemap best practice to add all those URLs you want to get indexed.

Make sure the content on those pages is updated and relevant. As always, remember that Google follows the “content is king” law. Google will penalize your site if it comes across any kind of duplicate or stolen content. 

Do add an HTML sitemap

Even if an XML Sitemap should be enough, there is a reason why you need to add an HTML sitemap to your site: User Experience.

Having a Sitemap in your footer makes it so much easier for users to find what they need in your domain. It also offers a sense of organized data that is always lovely to find. Also, HTML sitemaps generally consist of hyperlinks, which will take your users to those parts of your website where you want them to go.

We believe that UX is a vital part of SEO. At the end of the day, our job is to make sure your customers leave your page with their needs satisfied and wanting to come back. 

 

A list of our 8 Sitemap SEO best practices

Technical SEO, Beyond the Sitemap

While extremely important, there’s much more to technical SEO than just your sitemap. SEO is a constant race to be up to date, so you should always be checking for changes your site may suffer unexpectedly. 

What was the last time you reviewed your website’s code?

SEORadar is an SEO disaster prevention tool that scans your website’s code in real-time, looking for potential SEO errors. Sometimes, a misplaced tag can cause lasting harm to your SERP positioning. But searching for it manually can be time-consuming and inefficient.

With SEORadar, you can monitor changes to your website’s code automatically, and analyze their potential impact. Start your free trial and keep your codebase SEO-friendly, regardless of its size. 

Continue Reading

cables

What is a Canonical Tag?

Rel=canonical, otherwise known as a “canonical tag” is an HTML element that tells search engines which URL is the preferred page. The rel=canonical tag has

Read More
long tail

SEO: Long Tail Testing

Early on in SEO I learned traffic to individual keywords is bound to be disappointing.  I remember my first client, a company that makes folding bikes  named

Read More

Book A Demo

Learn how SEORadar can customize tracking SEO code changes