Aug 23, 2024
The Dispatcher is an AEM Reverse Proxy module installed in the Web Server used by the application infrastructure (usually Apache or IIS).
It’s called Dispatcher because it was originally designed to dispatch and balance requests to different publish servers.
With its traditional functionality, it provides:
-
Caching
-
Load Balancing
-
Filtering (allowing/denying) incoming client requests to AEM publish instances
-
Serving Static Content (to use Sling Dynamic you need to set the Dispatcher to supporting nocache selectors and enable TTL support).
This blog will focus on the basic Dispatcher knowledge required to configure caching in AEM.
Caching
Website content (HTML pages, CSS files, PDF files, images, etc) is cached/stored as regular files in the Apache filesystem.
The URL of the requested resource is used to determine the location and name of the cache's file. For example, if the requested page path is SITE_URL/a/b/c.html
, its cache version file will be [cache_docroot]/a/b/c.html
.
Notes on URL structures
-
Avoid having very long content URLs, as the path of the cached version might exceed the filesystem's name length limit.
-
If possible, avoid using Sling suffixes for your URLs; it makes it more complex to cache pages. But if it's required you will need to:
-
Add an extension for the main page URL and the URL of the page, including the suffix.
-
Use different extensions for the main page URL without the suffix and with the suffix.
For example: if
https://mydomain.com/index.html
is the page path and you want to add a suffix, the URL of the page with the suffix needs to be something likehttp://mydomain.com/index.dir/suffix.html
(notice we added an extension to the suffix but also changed the extension of the index page). For more details, see the documentation on Conflicting Suffix URLs.
-
Process flow to serve content from AEM Dispatcher
When content is new
-
Client request (usually from a browser)
-
The Dispatcher checks for cached requests locally
-
If NOT found request content from publisher
-
Publisher renders and sends content to the Dispatcher
-
Dispatcher adds content to cache and replies to client
When content is already cached
-
Browser request
-
The Dispatcher checks for cached requests locally.
-
If found, it returns the cached copy to the client
When is a content request Non-Cacheable?
-
When the URL:
-
has no extension
-
has query parameters (*)
-
has a suffix without an extension
-
is denied by dispatcher configuration (by Path, Pattern or MIME Type)
-
-
When the response status code is NOT 200
-
When the response contains at least one of the following headers:
-
“Dispatcher: no-cache”
-
“Cache-Control: no-cache|private”
-
“Pragma: no-cache”
-
-
When is a POST request
Note: (*) There’s a way to ignore(allow) some or all query parameters, so, even when a request URL contains parameters, the page is cached if those parameters are ignored. For more details, see Caching pages with Query Params/Arguments and Ignoring URL Parameters.
Example Dispatcher Cache
Example Dispatcher Cache
/cache
{
/docroot "/opt/dispatcher/cache"
/statfileslevel "3"
/rules
{
# List of cache rules of files containing cache rules to determinate files that are cached
}
/invalidate
{
# List of files extensions that are auto-invalidated
}
}
/docroot section
Identifies the directory where cached files are stored and must be the exact same path as the document root of the web server so that Dispatcher and the web server handle the same files.
Default AEM Dispatcher Cache directories
-
Author
-
/mnt/var/www/author
-
-
Publisher
-
/mnt/var/www/html
-
How to invalidate (flush) content
A. Single resource Invalidation
By making a direct GET request to invalidate path:
GET /invalidate
invalidate-path: /path/to/content.extension
<no body>
Notice the path value used in invalidate-path is the one AEM knows, not the cached file version one.
B. Auto Invalidation
By publishing a new version of the content
Auto Invalidation process flow
-
AEM Author User activates content, which triggers content to be replicated to Publisher.
-
Publisher gets content and the Dispatcher Flush Agent triggers the flush request to Dispatcher.
-
Dispatcher invalidates the changed content.
-
The next request for that content will require a fresh new copy from the publisher. Existing old cached files are deleted, and the new content is cached.
For auto-invalidation to work you need to
-
Have a Replication Agent on Author configured to point to the publish instance.
-
Usually, the invalidation request is fired by a Replication Agent on Publish system(s) to the Dispatcher. A custom replication agent for invalidation can be developed but is not suggested.
Auto Invalidation related properties
i. The /invalidate cache configuration
It allows you to configure file extensions for auto-invalidation. By default, the Dispatcher auto-invalidates files that end with “.html,” and all files belonging to an invalidated resource are physically deleted when the next request occurs.
/invalidate
{
/0000 { /glob "*" /type "deny" }
/0001 { /glob "*.html" /type "allow" }
}
If a site offers automatically generated PDF and ZIP files for download, it must automatically invalidate those files.
/invalidate
{
/denyall { /glob "*" /type "deny" }
/allowhtml { /glob "*.html" /type "allow" }
/allowzip { /glob "*.zip" /type "allow" }
/allowpdf { /glob "*.pdf" /type "allow" }
}
Note: notice you can use both numeric and string IDs for the different rules, they are just required to be unique.
ii. The /statfile property
Defines the file in the web server to be used by the Dispatcher to register the time of the most recent content update. Has no content, the Dispatcher just blocks access to it and updates the timestamp. The default statfile
is /docroot/.stat
.
It is usually not recommended to use this approach, as if there’s a single change in any resource of the AEM instance, all resources are considered invalid, which means all resources will be re-requested to AEM back end, which is not the more efficient behavior in big websites or multi-site AEM instances.
iii. The /statfileslevel property and .stat files
-
/statfileslevel
defines which level on the filesystem subtrees are considered “independent” for caching. -
When the
/statfileslevel
property is used, the/statfile
property is ignored.
The .stat files
-
A
.stat
file is an empty file with a creation timestamp. -
Dispatcher creates
.statfiles
in each folder from the docroot folder to the level specified by/statfileslevel
. -
The docroot folder is level 0.
Let’s imagine an AEM instance with two sites
-
/content
-
Sports-site
-
Tech-site
-
/statfileslevel "0"
: A .stat
file is created in the docroot. If any resource is updated, the invalidation spans the whole AEM instance including both sites.
/statfileslevel "1"
: A .stat
file is created at /content
(level 0), another at /content/sports-site
(level 1) and another at /content/tech-site
(level 1). If page /content/sports-site/cycling.html
is updated and published then all resources at /content/sports-site
are invalidated, but resource at /content/tech-site
will not be affected.
Nowadays, is it common (and recommended) to structure sites by brand, country and language. So, /statfileslevel "1"
would create invalidation per brand, /statfileslevel "2"
would create invalidation per country and /statfileslevel "3"
would create invalidation per language.
How does invalidation work?
-
When a file is invalidated, all
.stat
files from the docroot to the level of the invalidated file or the configuredstatsfilevel
(whichever is smaller) are touched (timestamp updated). -
All other files in the dispatcher cache (or up to a particular level, depending on the
/statfileslevel
value) are invalidated by touching the.stat
file. -
When a request for a resource comes .stat file last modification date is compared to the last modification date of the cached document.
-
All cached files with a creation date older than the .stat file were created before the last activation (and invalidation) and thus are considered “invalid.” They are still physically present in the filesystem, but the Dispatcher ignores them; they are in a “stale” state.
-
When a request is made for a stale resource, the Dispatcher re-fetches the document from AEM. The new version of the resource is stored in the cache filesystem with a new creation date.
Note: Invalidation by touching of .stat files can be prevented by sending the Header
CQ-Action-Scope: ResourceOnly
. This way, a particular resource can be flushed without invalidating other parts of the cache, for example, a JSON file that is dynamically created requires regular flushing independent of the cache.
iv) The /gracePeriod property
Defines the number of seconds a stale, auto-invalidated resource might still be served from the cache after the last activation. This property is helpful when per dispatcher configuration, consecutive/batch activations might repeatedly invalidate the entire cache, which can significantly impact sites with lots of traffic. The recommended value is 2 seconds.
v) The /enableTTLproperty
Allows HTTP standard expiration headers (
Cache-Control
, max-age
or Expires
date) to define a time-based cache invalidation. If /enableTTL
is set to 1 and any of the HTTP expiration headers are present in a request, an auxiliary, empty file alongside the cached file is created, with the modification time equal to the expiry date.
For new requests, the file expiration is checked.
-
If the file has expired according to the set TTL, no other checks are performed and the cached file is re-requested from the backend.
-
If the file has not expired, then standard cache invalidation rules defined by
/invalidate
and/statfileslevel
are applied, allowing the Dispatcher to invalidate files for which the TTL has not expired.Note: Before Dispatcher 4.3.5, the TTL invalidation logic was based only on the configured TTL value.
Important Considerations
-
AEM projects created using the AEM Project Maven Archetype deploy a set of default Dispatcher configuration files. It’s suggested that this file structure be used and extended to maintain consistency between AEM projects, which will help troubleshoot.
-
The Dispatcher does not automatically update cached composed pages.
The Dispatcher does not know what resources go into a rendered .html file; it only stores a static version of a rendered page (that can contain content and markup that has been drawn from other resources), but the Dispatcher does not render the pages, the rendering is performed by the Publish system. So, if Page B contains content of page A, and page A gets modified and published (then invalidated and updated in the cache), the cached version of page B will not get automatically updated with the new content of page A. Manually invalidating page B in the cache will be required to reflect the updated content of page A.
-
The ACS AEM Commons project provides a Dispatcher Flush Rules feature that helps create a “smart” flush scheme that listens for resource replications and invalidates ONLY the pages that use those resources.
Debugging
Use the request header X-Dispatcher-Info to get debug information about responses cached by the Dispatcher. Add the header X-Dispatcher-Info to a request, so the Dispatcher will answer with the response header X-Cache-Info that contains this information in a readable form about whether the target was cached, returned from cached, or not cacheable at all and why.
For the response header X-Cache-Info to be included, the farm must contain the following entry:
/info "1"
For example:
/farm
{
/mysite
{
/info "1"
}
}
Copy
If you use curl
to test, you need to add a value to send to the header, such as:
curl -v -H "X-Dispatcher-Info: true"
https://localhost:port/content/site/us/en.html
Wrapping Up
AEM Dispatcher is a vital tool for caching and load balancing. For developers working with AEM, it’s important to know how the AEM Dispatcher handles caching and how to configure it. This allows a comprehensive understanding of how to get the most out of a tool that is part of any AEM system and is usually delegated to the DevOps teams. It empowers developers and AEM users in general to understand what can be achieved using the AEM Dispatcher Cache mechanism to improve websites' experience for end users.
Resources
Related Insights
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.