Sitemap index
A sitemap index is an XML file that lists multiple individual sitemap files rather than URLs directly. Required when a single sitemap would exceed the limits of 50,000 URLs or 50 MB uncompressed. The mandatory pattern for any site whose URL count puts it past those caps.
Long definition
The sitemap protocol caps individual sitemap files at 50,000 URLs and 50 MB uncompressed (per sitemaps.org). A sitemap index is the official way to scale past those limits — instead of one giant sitemap, you publish multiple smaller sitemaps and one index file pointing at them. The index itself can reference up to 50,000 child sitemaps, so the practical ceiling is 2.5 billion URLs across one sitemap index.
The XML structure is small:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-articles.xml</loc>
<lastmod>2026-04-25</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-04-25</lastmod>
</sitemap>
</sitemapindex>
How to split your sitemaps in practice:
- By content type —
sitemap-articles.xml,sitemap-products.xml,sitemap-categories.xml. Easiest to debug, easiest for crawlers to prioritize. - By date —
sitemap-2025.xml,sitemap-2026.xml. Useful for archive-heavy news sites. - By section — one sitemap per major site section. Maps cleanly to teams and ownership.
- By size — automated chunking at 49,000 URLs per file with sequential naming. Good for fully dynamic sites.
Submit the sitemap index URL to Google Search Console and Bing Webmaster Tools — they'll discover the children automatically. Reference the index in robots.txt: Sitemap: https://example.com/sitemap.xml. Don't submit individual children separately; that's redundant and clutters the report.
The <lastmod> field on the index entry is what tells Google "this child sitemap has changed since last crawl, refetch it." Updating lastmod correctly is how you trigger faster discovery of new content. A common bug: regenerating the index every hour without changing lastmod values, so Google never realizes anything changed.
Compression: gzipped sitemaps (.xml.gz) count by uncompressed size. A 50 MB uncompressed file compressed to 5 MB still hits the size cap. The 50,000-URL cap is hard regardless of file size.
Common misconceptions
- "You only need a sitemap index for sites with millions of URLs." Sites with 60,000 URLs need one too — single sitemaps are capped at 50,000. Even at 100,000 URLs, the index pattern is the right architecture.
- "Sitemap indexes hurt crawl prioritization." They don't. Google uses the sitemap structure as a hint for organization, not a ranking input. A well-organized sitemap index actually helps Google distribute crawl across content types.
- "Submit each child sitemap individually for redundancy." It clutters Search Console reports with duplicates and adds no value. Submit the index URL only; let GSC discover the children.
- "Lastmod is optional, so skip it." It's optional in the spec but functionally important. Without accurate
lastmodvalues, Google has no signal that a child sitemap has changed and recrawl frequency suffers.
Continue exploring