Sitemap.xml in Sitecore Headless without SXA

Technology

Sitecore Development

Byron Calisto

Feb 27, 2023

Overview

An important tool for SEO (search engine optimization) is the sitemap.xml file, which is a file placed in the root directory of your site in order to make it easier for search engines tow crawl and index all of a site’s essential pages. In a traditional “headful” Sitecore .NET MVC site, this was achieved by implementing ready-to-use modules, or using SXA built-in sitemap functionality. Modern Headless/JSS sites can also leverage SXA for sitemaps, because it offers advanced tools to generate them. But if your headless site was developed without using SXA, the following guide will explore how to implement a dynamic sitemap.xml with Next.js.

What do you need?

A Sitecore headless project using JSS (tested with version 19.)
GraphQL enabled in your Sitecore JSS site.
A head (JS app) developed using Next.js

Using the GraphQLSitemapService

The Sitecore Next.js toolset already includes a sitemap service that is used in the getStaticPaths function of an SSG+ISR (Static Site Generation + Incremental Static Regeneration) project. It is used to retrieve all of the Sitecore site’s pages when building the static pages in Next.js SSG. Internally, this function decides if it needs to use the sitemap service for disconnected mode, or the sitemap service that uses GraphQL to retrieve the actual Sitecore pages in connected or production mode. Our sitemap.xml will ONLY work in connected/production mode, so we can ignore the disconnected mode. Place the sitemap.xml.tsx code file under your project’s src/pages folder:

src/pages/sitemap.xml.tsx

import { GraphQLSitemapService, StaticPath } from ‘@sitecore-jss/sitecore-jss-nextjs’;
import { GetServerSideProps } from ‘next’;
import config from ‘temp/config’;
 
const Sitemap: React.FC = () => null;
 
export const getServerSideProps: GetServerSideProps = async ({ res }) => {
  // Retrieve configuration values
  const endpoint = config.graphQLEndpoint;
  const apiKey: string = process.env.SITECORE_API_KEY ?? ‘’;
  let publicUrl: string = process.env.PUBLIC_URL ?? ‘’;
 
  // If required configuration values are not found, return error
  if (!apiKey || !publicUrl) {
    res.setHeader(‘Content-Type’, ‘text/xml’);
    res.write(‘<error>No Sitecore API Key and/or public URL configured for the site</error>’);
    return { props: {} };
  }
 
  // Remove trailing slash if the PUBLIC_URL has it
  if (publicUrl.endsWith(‘/‘)) {
    publicUrl = publicUrl.substring(0, publicUrl.length - 2);
  }
 
  // Create the GraphQLSitemapService object with required parameters
  const graphQLSitemapService = new GraphQLSitemapService({
    endpoint: endpoint,
    apiKey: apiKey,
    siteName: config.jssAppName,
  });
 
  let sitemapResult = ‘’;
 
  // Fetch the sitemap using GraphQLSitemapService object. This example only fetches pages in ‘en’ language
  const pageList: StaticPath[] = await graphQLSitemapService.fetchSSGSitemap([‘en’]);
 
  // Build the sitemap.xml internal XML code
  pageList.forEach((sp) => {
    let pagePath = sp.params.path.join(‘/‘);
 
    if (!pagePath.startsWith(‘/‘)) {
      pagePath = ‘/‘ + pagePath;
    }
 
    sitemapResult += `<url><loc>${publicUrl}${pagePath}</loc></url>`;
  });
 
  // Return the sitemap.xml as proper XML
  if (res) {
    res.setHeader(‘Content-Type’, ‘text/xml’);
    res.setHeader(‘Cache-Control’, ‘max-age=86400, public’); // Set this cache to avoid regenerating the page too frequently
    res.write(`<?xml version=“1.0” encoding=“UTF-8”?>
        <urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”>
        ${sitemapResult}
        </urlset>`);
    res.end();
  }
 
  return {
    props: {},
  };
};
 
export default Sitemap;

As you can see, this is a Server-Side Rendered (SSR) page, so it is able to dynamically retrieve the sitemap content using the GraphQLSitemapService functionality and to render the page as proper XML. The example shown above only works for single language (specifically, English ‘en’), but you can enhance the code to retrieve the current language from the URL (or query string) and feeding it to the graphQLSitemapService.fetchSSGSitemap() function call.

Using a GraphQL Query

The previous solution uses the built-in GraphQLSitemapService class provided by Sitecore, and it is good enough if your site does not have too many pages. Internally it only retrieves 10 pages per callback, and it needs to do repeated callbacks until it gets all the pages. For larger sites that does not perform very well, because that translates into a lot of callbacks that can be slower than doing fewer callbacks with more pages per callback. To be able to do fewer callbacks with up to 100 pages per callback, we use a custom GraphQL query. For this to work, please make sure you first install the graphql-request library in your Next.js project by executing npm install graphql-request.

src/pages/sitemap.xml.tsx

import { GraphQLClient, gql } from ‘graphql-request’;
import { GetServerSideProps } from ‘next’;
import config from ‘temp/config’;
 
// Custom type for each sitemap entry in the GraphQL response
type SitemapEntry = {
  url: {
    path: string;
  };
};
 
// Custom type for handling the GraphQL response
type SitemapResult = {
  page: {
    total: number;
    pageInfo: {
      endCursor: string;
      hasNext: boolean;
    };
    results: [SitemapEntry];
  };
};
 
const Sitemap: React.FC = () => null;
 
export const getServerSideProps: GetServerSideProps = async ({ res }) => {
  // Retrieve configuration values
  const endpoint = config.graphQLEndpoint;
  const apiKey: string = process.env.SITECORE_API_KEY ?? ‘’;
  const rootId: string = process.env.SC_SITEMAP_ROOT_ID ?? ‘’;
  let publicUrl: string = process.env.PUBLIC_URL ?? ‘’;
 
  // If required configuration values are not found, return error
  if (!apiKey || !publicUrl || !rootId) {
    res.setHeader(‘Content-Type’, ‘text/xml’);
    res.write(‘<error>No Sitecore API Key, Sitemap Root ID and/or public URL configured for the site</error>’);
    return { props: {} };
  }
 
  // Remove trailing slash if the PUBLIC_URL has it
  if (publicUrl.endsWith(‘/‘)) {
    publicUrl = publicUrl.substring(0, publicUrl.length - 2);
  }
 
  // Function to handle the rendering of each URL in the sitemap
  const renderUrl = (results: SitemapEntry[]): string => {
    let urlList = ‘’;
    for (let i = 0; i < results.length; i++) {
      let entry = results[i];
      if (entry && entry.url) {
        urlList = urlList + `<url><loc>${publicUrl}${entry.url.path}</loc></url>`;
      }
    }
 
    return urlList;
  };
 
  // Initialize the GraphQL client
  const graphQLClient = new GraphQLClient(endpoint);
  graphQLClient.setHeader(‘sc_apikey’, apiKey);
 
  // Set the response headers
  res.setHeader(‘Content-Type’, ‘text/xml’);
  res.setHeader(‘Cache-Control’, ‘max-age=86400, public’); // Set this cache to avoid regenerating the page too frequently
  res.write(`<?xml version=“1.0” encoding=“UTF-8”?>
        <urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”>`);
 
  // Define the GraphQL query to retrieve the sitemap from Sitecore
  const sitemapQuery = gql`
    query Sitemap($rootId: String!, $endCursor: String!) {
      page: search(
        where: {
          AND: [
            { name: “_path”, value: $rootId, operator: CONTAINS }
            { name: “_language”, value: “en” }
            { name: “_hasLayout”, value: “true” }
          ]
        }
        first: 100
        after: $endCursor
      ) {
        total
        pageInfo {
          endCursor
          hasNext
        }
        results {
          url {
            path
          }
        }
      }
    }
  `;
 
  let nextSet = true;
  let endCursor = ‘’;
 
  // Perform callbacks to Sitecore in a loop until all page batches have been retrieved
  while (nextSet) {
    // This callback will retrieve a single batch of pages (up to 100)
    let result: SitemapResult = await graphQLClient.request(sitemapQuery, {
      rootId: rootId,
      endCursor: endCursor,
    });
 
    // Write the retrieved pages in the sitemap.xml response
    res.write(renderUrl(result.page.results));
 
    // Determine from GraphQL response if there is another batch of pages
    nextSet = result.page.pageInfo.hasNext;
 
    // Set the cursor to the next batch of pages
    endCursor = result.page.pageInfo.endCursor;
  }
 
  // Finish the sitemap.xml and flush the response
  res.write(‘</urlset>’);
  res.end();
 
  return {
    props: {},
  };
};
 
export default Sitemap;

The sitemapQuery variable contains the actual GraphQL query we are using to retrieve the sitemap. This GraphQL query requires two parameters: rootId and endCursor. The first parameter comes from configuration, and it should be taken from your specific site root item’s ID. Copy and paste the ID (without curly braces and dashes) in the assignment for the SC_SITEMAP_ROOT_ID variable in your .env file (or in Environment Variables in Vercel.) The second parameter is the cursor code returned by the GraphQL service after calling the query. For retrieving the first batch, the endCursor parameter should be set to an empty string. But for the subsequent callbacks, it should be set to what the previous callback returned in the pageInfo.endCursor value in the response. Also, the response will include a pageInfo.hasNext value set to either true if there is a following batch to retrieve, or false if it is the final batch returned.

What makes this solution perform better is the query limit first: 100 (found in the GraphQL query after the where clause.) This means that for each callback it will retrieve 100 results (pages) instead of the default 10. It reduces the amount of callbacks to the Sitecore service since each callback will bring in more results. You can adjust this value to your needs (but don’t make it too big, otherwise Sitecore will return an error.)

Enhancements

The solutions provided can be enhanced further for code readability/correctness or extra functionality. For both solutions, you can move the sitemap retrieval logic to its own function, so you can reduce the amount of code inside the getServerSideProps function. For the GraphQL query solution, you can separate the query to its own .graphql file and read it using the readFileSync function (similar to what you would do for integrated GraphQL query definitions for templates in JSS code-first). Functionality-wise, you can also include a checkbox field or similar mechanism in your page templates in Sitecore, and include that condition in the GraphQL query to include or exclude pages that will show up in the sitemap.xml file.

Conclusion

These solutions should help when your JSS/Headless site is not using SXA and you need quick-and-simple dynamic sitemap.xml functionality. But if you are creating a new JSS/Headless site from scratch, create it inside a Headless SXA Tenant even if you are not going to use the Headless SXA components in your site. At least you’ll have built-in redirects and much better sitemap.xml functionality out-of-the-box.