The HTML5 cache manifest is the caching mechanism for the W3C’s specification for offline web applications.  Files cached by the manifest are not deleted when a user clears the standard browser cache, which makes the manifest a more persistent method for caching static assets such as CSS, JavaScript, and images for online web applications. 

Introduction

The W3C has created a working draft defining a more robust method for caching files, so in the case that a user is offline a web application can still function, at least in a limited capacity. This new resource is referred to as the “application cache”. The application cache is similar to the standard browser cache in that subsequent requests for cached files are served from the local cache as opposed to making a network call to retrieve the requested file. The primary differences between the application cache and browser  cache are the level of persistence and the ability to list specific files without concern about expires headers and last modified dates. Once a file is in the application cache it is not removed from disk when the cache is cleared or when items are purged from a full cache by the browser’s LRU (Least Recently Used) algorithm. The files continue to be served from the application cache until it is deleted or the manifest is updated.

Cache Manifest

In order for a browser to create an application cache it needs a set of directives. These directives are contained in the cache manifest. The cache manifest is a plain text file that is specified in an attribute of the opening html tag of an HTML5 document and downloaded by the browser.

<html manifest="manifest.appcache"> 

Below are the contents of an example cache manifest file, with comments describing the purpose of each section. The minimum requirements for a valid cache manifest are ‘CACHE MANIFEST’ followed by a line break and a list of files to be cached separated by line breaks.

CACHE MANIFEST #required

CACHE: #optional, files to be cached for offline usage
/images/foo.png
/js/bar.js
/css/baz.css

NETWORK: #optional, files that should not be cached; references to these files will bypass the cache
/foo/bar.html

FALLBACK: #optional, files to be used in place of ‘online’ files when the web application is offline; online and offline fallback should be on the same line
/foo.php /offline.html

Web Server Configuration

The server needs to be configured to respond with the correct mime-type, “text/cache-manifest”, when serving the manifest file. For example, in Apache this can be accomplished by adding ‘AddType text/cache-manifest .my_cache_manifest_ext’ to httpd.conf or the appropriate .htaccess file.

Real World Usage

I have never created a web application that was intended to operate offline even in a limited capacity. I can see the value in a small application targeted for a mobile or tablet device that does not have much dynamic content operating offline to reduce expensive network calls or maybe even queuing updates on the client side in Web Storage or IndexedDB when the user is out of a service area. However, in the vast majority of cases web applications are not designed or intended to function locally because the gains of implementing the more complicated architecture are not worth the cost for such niche cases, at least that is my assumption – if anyone knows of other cases please let me know. So then why use the HTML5 offline web application specification?

  1. It is an excellent cache primer. Over the years web applications have evolved into highly complex pieces of software requiring numerous supporting JavaScript, CSS, and image files. Priming the cache improves performance because files are served from the cache immediately when needed as opposed to making network calls for resources just in time. Cache priming is typically done via XHR. The cache manifest provides a simple alternative with a very nice API. XHR can still be used as a fallback to handle cache priming in browsers that do support application caches.
  2. It is a highly persistent cache. Browser caches are fragile. A study conducted by Yahoo! showed that 80% (this statistic is somewhat invalid) of page views were done with a primed cache, but 40 – 60% of unique visitors viewed pages with an empty cache. The suspected reasons for the empty caches are as follows. A user specifically cleared their cache or they bypassed the cache (SHIFT + F5/REFRESH). The browser was configured to clear the cache, e.g., when it closes. Lastly, the files were removed from the cache by the browser’s LRU algorithm – this could be occurring more frequently than one would think if the default cache size is small (see Steve Souder’s Blog entry). The application cache addresses all these of issues because it is not cleared when the browser cache is cleared and files are not subject to the LRU algorithm (disk space is allotted by domain).

Considerations

The file that references the cache manifest (master) is implicitly cached – if you know of a clever work around please let me know. If the master contains dynamic content assembled by the server then this will be a problem. If your master file fetches data via AJAX and constructs the contents of the page in the front end then this is not a problem, i.e., build out all dynamic sections of the DOM in the front end using JavaScript. This may sound like a limitation or unnecessary complexity, but it is actually a good architecture. It lessens the workload on your servers by distributing some of the work to the clients and serving local copies of static resources, which improves page load times.

Another commonly overlooked issue is that when your files are served from the application cache any updates to the cache manifest files will not be reflected in the master until it is reloaded. A simple solution is to listen for the ‘updateready’ event in the master and reload the page:

window.addEventListener('load', function (e) {
    window.applicationCache.addEventListener('updateready', function (e) {
        window.location.reload();
    }, false);
}, false);

Tools of the Trade

Developing web applications that rely on a cache manifest can be somewhat more cumbersome due to the persistency of the files. Application cache manifests can be viewed and deleted in Chrome via chrome://appcache-internals/. Firefox has an add-on, CacheViewer, which can be used to view application cache items (device = offline), and application caches can be deleted under ‘Options’ -> ‘Advanced’ -> ‘Network’.

Conclusion

The application cache need not only be used for applications designed to function in an offline state. It can improve online only web application performance by priming the cache and stringently caching static resources. It also encourages good front end architecture choices, at least in my opinion, that optimize page loads. However, a nice feature for more design flexibility would be the ability to fetch the master from the network when online. Perhaps if the demand for this feature grows the W3C will add it to the working specification and browser vendors will implement it.

Discussion
climboid 1 comment Joined 11/11
10 Nov 2011

I've worked with cache manifest before for and iPad demo I had to make. I would recommend to include the max amount of storage that you can have per device. I believe for safari in iPad it's close to 5 megs (in my case that forced me to change my png's to gifs and many other things). Not sure if there is a work around on this or not but it would be nice to let people know so that they consider that as well.

JasonStrimpel 5 comments Joined 09/11
17 Nov 2011

Good point. From what I can find it looks like 5MB is the standard limit across all browsers.

MarlonSamules 1 comment Joined 12/11
03 Dec 2011

It looks very interesting, I haven't worked in this section, because I was not aware of this. Thanks to you, I'll look forward it.

software patents

Air2 1 comment Joined 12/11
13 Dec 2011

Hmm the caching of the HTML itself is not done, when the url changes, for example by adding a parameter. file.html?dummy=20111213105001

JasonStrimpel 5 comments Joined 09/11
13 Dec 2011

@Air2 Interesting. It is like cache busting for the AppCache without actually busting the cache. It is "new" URL, so the browser treats it as such. The unique URL would be added as another master for the cache manifest, so it would still benefit from the AppCache. This technique would work well for child pages. However, it would not work well for single page web applications or "home" pages because of links to the site. Very useful observation.

DorisGreys 1 comment Joined 07/12
13 Jul 2012

I worked on project best online casino and I had a lot of information isat the Internet. Windows - this is a very smart and fast operating system (smile). One of its main objectives - a reduction of operations to be performed, since it is a modern computer operations with disks consume the most time.
The easiest and most reliable way to reduce the number of disk operations - is caching data in memory, that is, if you do something once opened from disk, it can at that moment to load into memory, and there is not more to unload. Then any of the following reference to this file will be an instant - it is already in memory and disk load it is not necessary. Ideally, if more memory than files on disk - you can download all the files in memory, and everything will work very quickly

JasonStrimpel 5 comments Joined 09/11
13 Jul 2012

@DorisGreys Good points - besides the one about Windows. :) I also dump non-sensitive data into local storage. All of this data is then used to populate client-side models - in memory structures. I then use conditional fetches to only make network calls if the data does not exist in local storage. Local storage is either cleared or updated on a model update. All templates are compiled and saved in memory as part of the client side view. Backbone is my favorite library for structuring all this logic to date. Dependency management is handled by require.js.

PhilipRoger 1 comment Joined 08/12
13 Aug 2012

Jason, to me that seems like a pretty complicated way to go about it. How do you get into memory structures?

RobLance 1 comment Joined 08/12
14 Aug 2012

I had not idea that this could be done.... It sure seems complicated but very interesting at the same time. I will look into it later today.

nickshanks 1 comment Joined 09/12
25 Sep 2012

"The file that references the cache manifest (master) is implicitly cached – if you know of a clever work around please let me know."

The obvious answer is to load the manifest from an empty HTML file loaded into an iframe.

KredytKielce 1 comment Joined 10/12
27 Oct 2012

You are absolutly right/ Kredyt

Martin12 1 comment Joined 11/12
17 Nov 2012

It looks very interesting, I haven't worked in this section, because I was not aware of this. Thanks to you, I'll look forward it.

remi-grumeau 1 comment Joined 07/14
2 months ago

> I have never created a web application that was intended to operate offline even in a limited capacity.
> So then why use the HTML5 offline web application specification?
For this exact reason :) I've been working on a components-based ABAP toolkit to create webapps & hybrid applications within SAP. Mosto of our customers create inbox or HR apps with it, so yes of course, manifest & offline capability/databases are a VERY important part of the process!
It is called ApplicationCache for a reason: it's here to help developers to create offline applications in HTML5. It's not webCache or makethewebofflineCache. So at the very basic, when you start using it on a website, you might be ready to get into troubles...

You must sign in to leave a comment.