Feb 27, 2023
Overview
An important tool for SEO (search engine optimization) is the sitemap.xml file, which is a file placed in the root directory of your site in order to make it easier for search engines tow crawl and index all of a site’s essential pages. In a traditional “headful” Sitecore .NET MVC site, this was achieved by implementing ready-to-use modules, or using SXA built-in sitemap functionality. Modern Headless/JSS sites can also leverage SXA for sitemaps, because it offers advanced tools to generate them. But if your headless site was developed without using SXA, the following guide will explore how to implement a dynamic sitemap.xml with Next.js.
What do you need?
A Sitecore headless project using JSS (tested with version 19.)
GraphQL enabled in your Sitecore JSS site.
A head (JS app) developed using Next.js
Using the GraphQLSitemap Service
The Sitecore Next.js toolset already includes a sitemap service that is used in the getStaticPaths
function of an SSG+ISR (Static Site Generation + Incremental Static Regeneration) project. It is used to retrieve all of the Sitecore site’s pages when building the static pages in Next.js SSG. Internally, this function decides if it needs to use the sitemap service for disconnected mode, or the sitemap service that uses GraphQL to retrieve the actual Sitecore pages in connected or production mode. Our sitemap.xml will ONLY work in connected/production mode, so we can ignore the disconnected mode. Place the sitemap.xml.tsx
code file under your project’s src/pages
folder:
src/pages/sitemap.xml.tsx
import { GraphQLSitemapService, StaticPath } from ‘@sitecore-jss/sitecore-jss-nextjs’;
import { GetServerSideProps } from ‘next’;
import config from ‘temp/config’;
const Sitemap: React.FC = () => null;
export const getServerSideProps: GetServerSideProps = async ({ res }) => {
// Retrieve configuration values
const endpoint = config.graphQLEndpoint;
const apiKey: string = process.env.SITECORE_API_KEY ?? ‘’;
let publicUrl: string = process.env.PUBLIC_URL ?? ‘’;
// If required configuration values are not found, return error
if (!apiKey || !publicUrl) {
res.setHeader(‘Content-Type’, ‘text/xml’);
res.write(‘<error>No Sitecore API Key and/or public URL configured for the site</error>’);
return { props: {} };
}
// Remove trailing slash if the PUBLIC_URL has it
if (publicUrl.endsWith(‘/‘)) {
publicUrl = publicUrl.substring(0, publicUrl.length - 2);
}
// Create the GraphQLSitemapService object with required parameters
const graphQLSitemapService = new GraphQLSitemapService({
endpoint: endpoint,
apiKey: apiKey,
siteName: config.jssAppName,
});
let sitemapResult = ‘’;
// Fetch the sitemap using GraphQLSitemapService object. This example only fetches pages in ‘en’ language
const pageList: StaticPath[] = await graphQLSitemapService.fetchSSGSitemap([‘en’]);
// Build the sitemap.xml internal XML code
pageList.forEach((sp) => {
let pagePath = sp.params.path.join(‘/‘);
if (!pagePath.startsWith(‘/‘)) {
pagePath = ‘/‘ + pagePath;
}
sitemapResult += `<url><loc>${publicUrl}${pagePath}</loc></url>`;
});
// Return the sitemap.xml as proper XML
if (res) {
res.setHeader(‘Content-Type’, ‘text/xml’);
res.setHeader(‘Cache-Control’, ‘max-age=86400, public’); // Set this cache to avoid regenerating the page too frequently
res.write(`<?xml version=“1.0” encoding=“UTF-8”?>
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”>
${sitemapResult}
</urlset>`);
res.end();
}
return {
props: {},
};
};
export default Sitemap;
As you can see, this is a Server-Side Rendered (SSR) page, so it is able to dynamically retrieve the sitemap content using the GraphQLSitemapService
functionality and to render the page as proper XML. The example shown above only works for single language (specifically, English ‘en’), but you can enhance the code to retrieve the current language from the URL (or query string) and feeding it to the graphQLSitemapService.fetchSSGSitemap()
function call.
Using a GraphQL Query
The previous solution uses the built-in GraphQLSitemapService
class provided by Sitecore, and it is good enough if your site does not have too many pages. Internally it only retrieves 10 pages per callback, and it needs to do repeated callbacks until it gets all the pages. For larger sites that does not perform very well, because that translates into a lot of callbacks that can be slower than doing fewer callbacks with more pages per callback. To be able to do fewer callbacks with up to 100 pages per callback, we use a custom GraphQL query. For this to work, please make sure you first install the graphql-request library in your Next.js project by executing npm install graphql-request
.
src/pages/sitemap.xml.tsx
import { GraphQLClient, gql } from ‘graphql-request’;
import { GetServerSideProps } from ‘next’;
import config from ‘temp/config’;
// Custom type for each sitemap entry in the GraphQL response
type SitemapEntry = {
url: {
path: string;
};
};
// Custom type for handling the GraphQL response
type SitemapResult = {
page: {
total: number;
pageInfo: {
endCursor: string;
hasNext: boolean;
};
results: [SitemapEntry];
};
};
const Sitemap: React.FC = () => null;
export const getServerSideProps: GetServerSideProps = async ({ res }) => {
// Retrieve configuration values
const endpoint = config.graphQLEndpoint;
const apiKey: string = process.env.SITECORE_API_KEY ?? ‘’;
const rootId: string = process.env.SC_SITEMAP_ROOT_ID ?? ‘’;
let publicUrl: string = process.env.PUBLIC_URL ?? ‘’;
// If required configuration values are not found, return error
if (!apiKey || !publicUrl || !rootId) {
res.setHeader(‘Content-Type’, ‘text/xml’);
res.write(‘<error>No Sitecore API Key, Sitemap Root ID and/or public URL configured for the site</error>’);
return { props: {} };
}
// Remove trailing slash if the PUBLIC_URL has it
if (publicUrl.endsWith(‘/‘)) {
publicUrl = publicUrl.substring(0, publicUrl.length - 2);
}
// Function to handle the rendering of each URL in the sitemap
const renderUrl = (results: SitemapEntry[]): string => {
let urlList = ‘’;
for (let i = 0; i < results.length; i++) {
let entry = results[i];
if (entry && entry.url) {
urlList = urlList + `<url><loc>${publicUrl}${entry.url.path}</loc></url>`;
}
}
return urlList;
};
// Initialize the GraphQL client
const graphQLClient = new GraphQLClient(endpoint);
graphQLClient.setHeader(‘sc_apikey’, apiKey);
// Set the response headers
res.setHeader(‘Content-Type’, ‘text/xml’);
res.setHeader(‘Cache-Control’, ‘max-age=86400, public’); // Set this cache to avoid regenerating the page too frequently
res.write(`<?xml version=“1.0” encoding=“UTF-8”?>
<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9”>`);
// Define the GraphQL query to retrieve the sitemap from Sitecore
const sitemapQuery = gql`
query Sitemap($rootId: String!, $endCursor: String!) {
page: search(
where: {
AND: [
{ name: “_path”, value: $rootId, operator: CONTAINS }
{ name: “_language”, value: “en” }
{ name: “_hasLayout”, value: “true” }
]
}
first: 100
after: $endCursor
) {
total
pageInfo {
endCursor
hasNext
}
results {
url {
path
}
}
}
}
`;
let nextSet = true;
let endCursor = ‘’;
// Perform callbacks to Sitecore in a loop until all page batches have been retrieved
while (nextSet) {
// This callback will retrieve a single batch of pages (up to 100)
let result: SitemapResult = await graphQLClient.request(sitemapQuery, {
rootId: rootId,
endCursor: endCursor,
});
// Write the retrieved pages in the sitemap.xml response
res.write(renderUrl(result.page.results));
// Determine from GraphQL response if there is another batch of pages
nextSet = result.page.pageInfo.hasNext;
// Set the cursor to the next batch of pages
endCursor = result.page.pageInfo.endCursor;
}
// Finish the sitemap.xml and flush the response
res.write(‘</urlset>’);
res.end();
return {
props: {},
};
};
export default Sitemap;
The sitemapQuery
variable contains the actual GraphQL query we are using to retrieve the sitemap. This GraphQL query requires two parameters: rootId
and endCursor
. The first parameter comes from configuration, and it should be taken from your specific site root item’s ID. Copy and paste the ID (without curly braces and dashes) in the assignment for the SC_SITEMAP_ROOT_ID
variable in your .env file (or in Environment Variables in Vercel.) The second parameter is the cursor code returned by the GraphQL service after calling the query. For retrieving the first batch, the endCursor
parameter should be set to an empty string. But for the subsequent callbacks, it should be set to what the previous callback returned in the pageInfo.endCursor
value in the response. Also, the response will include a pageInfo.hasNext
value set to either true if there is a following batch to retrieve, or false if it is the final batch returned.
What makes this solution perform better is the query limit first: 100
(found in the GraphQL query after the where clause.) This means that for each callback it will retrieve 100 results (pages) instead of the default 10. It reduces the amount of callbacks to the Sitecore service since each callback will bring in more results. You can adjust this value to your needs (but don’t make it too big, otherwise Sitecore will return an error.)
Enhancements
The solutions provided can be enhanced further for code readability/correctness or extra functionality. For both solutions, you can move the sitemap retrieval logic to its own function, so you can reduce the amount of code inside the getServerSideProps
function. For the GraphQL query solution, you can separate the query to its own .graphql
file and read it using the readFileSync
function (similar to what you would do for integrated GraphQL query definitions for templates in JSS code-first). Functionality-wise, you can also include a checkbox field or similar mechanism in your page templates in Sitecore, and include that condition in the GraphQL query to include or exclude pages that will show up in the sitemap.xml file.
Conclusion
These solutions should help when your JSS/Headless site is not using SXA and you need quick-and-simple dynamic sitemap.xml functionality. But if you are creating a new JSS/Headless site from scratch, create it inside a Headless SXA Tenant even if you are not going to use the Headless SXA components in your site. At least you’ll have built-in redirects and much better sitemap.xml functionality out-of-the-box.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.