This is a snapshot of an early working draft and has therefore been superseded by the HTML standard.

This document will not be further updated.

HTML 5

Call For Comments — 27 October 2007

4.10. Client-side session and persistent storage of name/value pairs

4.10.1. Introduction

This section is non-normative.

This specification introduces two related mechanisms, similar to HTTP session cookies [RFC2965], for storing structured data on the client side.

The first is designed for scenarios where the user is carrying out a single transaction, but could be carrying out multiple transactions in different windows at the same time.

Cookies don't really handle this case well. For example, a user could be buying plane tickets in two different windows, using the same site. If the site used cookies to keep track of which ticket the user was buying, then as the user clicked from page to page in both windows, the ticket currently being purchased would "leak" from one window to the other, potentially causing the user to buy two tickets for the same flight without really noticing.

To address this, this specification introduces the sessionStorage DOM attribute. Sites can add data to the session storage, and it will be accessible to any page from that domain opened in that window.

For example, a page could have a checkbox that the user ticks to indicate that he wants insurance:

<label>
 <input type="checkbox" onchange="sessionStorage.insurance = checked">
 I want insurance on this trip.
</label>

A later page could then check, from script, whether the user had checked the checkbox or not:

if (sessionStorage.insurance) { ... }

If the user had multiple windows opened on the site, each one would have its own individual copy of the session storage object.

The second storage mechanism is designed for storage that spans multiple windows, and lasts beyond the current session. In particular, Web applications may wish to store megabytes of user data, such as entire user-authored documents or a user's mailbox, on the clientside for performance reasons.

Again, cookies do not handle this case well, because they are transmitted with every request.

The globalStorage DOM attribute is used to access the global storage areas.

The site at example.com can display a count of how many times the user has loaded its page by putting the following at the bottom of its page:

<p>
  You have viewed this page
  <span id="count">an untold number of</span>
  time(s).
</p>
<script>
  var storage = globalStorage['example.com'];
  if (!storage.pageLoadCount)
    storage.pageLoadCount = 0;
  storage.pageLoadCount = parseInt(storage.pageLoadCount, 10) + 1;
  document.getElementById('count').textContent = storage.pageLoadCount;
</script>

Each domain and each subdomain has its own separate storage area. Subdomains can access the storage areas of parent domains, and domains can access the storage areas of subdomains.

Storage areas (both session storage and global storage) store strings. To store structured data in a storage area, you must first convert it to a string.

4.10.2. The Storage interface

interface Storage {
  readonly attribute unsigned long length;
  DOMString key(in unsigned long index);
  StorageItem getItem(in DOMString key);
  void setItem(in DOMString key, in DOMString data);
  void removeItem(in DOMString key);
};

Each Storage object provides access to a list of key/value pairs, which are sometimes called items. Keys are strings, and any string (including the empty string) is a valid key. Values are strings with associated metadata, represented by StorageItem objects.

Each Storage object is associated with a list of key/value pairs when it is created, as defined in the sections on the sessionStorage and globalStorage attributes. Multiple separate objects implementing the Storage interface can all be associated with the same list of key/value pairs simultaneously.

Key/value pairs have associated metadata. In particular, a key/value pair can be marked as either "safe only for secure content", or as "safe for both secure and insecure content".

A key/value pair is accessible if either it is marked as "safe for both secure and insecure content", or it is marked as "safe only for secure content" and the script in question is running in a secure browsing context.

The length attribute must return the number of key/value pairs currently present and accessible in the list associated with the object.

The key(n) method must return the name of the nth accessible key in the list. The order of keys is user-agent defined, but must be consistent within an object between changes to the number of keys. (Thus, adding or removing a key may change the order of the keys, but merely changing the value of an existing key must not.) If n is less than zero or greater than or equal to the number of key/value pairs in the object, then this method must raise an INDEX_SIZE_ERR exception.

The getItem(key) method must return the StorageItem object representing the key/value pair with the given key. If the given key does not exist in the list associated with the object, or is not accessible, then this method must return null. Subsequent calls to this method with the same key from scripts running in the same security context must return the same instance of the StorageItem interface. (Such instances must not be shared across security contexts, though.)

The setItem(key, value) method must first check if a key/value pair with the given key already exists in the list associated with the object.

If it does not, then a new key/value pair must be added to the list, with the given key and value, such that any current or future StorageItem objects referring to this key/value pair will return the value given in the value argument. If the script setting the value is running in a secure browsing context, then the key/value pair must be marked as "safe only for secure content", otherwise it must be marked as "safe for both secure and insecure content".

If the given key does exist in the list, then, if the key/value pair with the given key is accessible, it must have its value updated so that any current or future StorageItem objects referring to this key/value pair will return the value given in the value argument. If it is not accessible, the method must raise a security exception.

When the setItem() method is successfully invoked (i.e. when it doesn't raise an exception), events are fired on other HTMLDocument objects that can access the newly stored data, as defined in the sections on the sessionStorage and globalStorage attributes.

The removeItem(key) method must cause the key/value pair with the given key to be removed from the list associated with the object, if it exists and is accessible. If no item with that key exists, the method must do nothing. If an item with that key exists but is not accessible, the method must raise a security exception.

The setItem() and removeItem() methods must be atomic with respect to failure. That is, changes to the data storage area must either be successful, or the data storage area must not be changed at all.

In the ECMAScript DOM binding, enumerating a Storage object must enumerate through the currently stored and accessible keys in the list the object is associated with. (It must not enumerate the values or the actual members of the interface). In the ECMAScript DOM binding, Storage objects must support dereferencing such that getting a property that is not a member of the object (i.e. is neither a member of the Storage interface nor of Object) must invoke the getItem() method with the property's name as the argument, and setting such a property must invoke the setItem() method with the property's name as the first argument and the given value as the second argument.

4.10.3. The StorageItem interface

Items in Storage objects are represented by objects implementing the StorageItem interface.

interface StorageItem {
           attribute boolean secure;
           attribute DOMString value;
};

In the ECMAScript DOM binding, StorageItem objects must stringify to their value attribute's value.

The value attribute must return the current value of the key/value pair represented by the object. When the attribute is set, the user agent must invoke the setItem() method of the Storage object that the StorageItem object is associated with, with the key that the StorageItem object is associated with as the first argument, and the new given value of the attribute as the second argument.

StorageItem objects must be live, meaning that as the underlying Storage object has its key/value pairs updated, the StorageItem objects must always return the actual value of the key/value pair they represent.

If the key/value pair has been deleted, the StorageItem object must act as if its value was the empty string. On setting, the key/value pair will be recreated.

The secure attribute must raise an INVALID_ACCESS_ERR exception when accessed or set from a script whose browsing context is not considered secure. (Basically, if the page is not an SSL page.)

If the browsing context is secure, then the secure attribute must return true if the key/value pair is considered "safe only for secure content", and false if it is considered "safe for both secure and insecure content". If it is set to true, then the key/value pair must be flagged as "safe only for secure content". If it is set to false, then the key/value pair must be flagged as "safe for both secure and insecure content".

If a StorageItem object is obtained by a script that is not running in a secure browsing context, and the item is then marked with the "safe only for secure content" flag by a script that is running in a secure context, the StorageItem object must continue to be available to the first script, who will be able to read the value of the object. However, any attempt to set the value would then start raising exceptions as described in the previous section, and the key/value pair would no longer appear in the appropriate Storage object.

4.10.4. The sessionStorage attribute

The sessionStorage attribute represents the storage area specific to the current top-level browsing context.

Each top-level browsing context has a unique set of session storage areas, one for each domain.

User agents should not expire data from a browsing context's session storage areas, but may do so when the user requests that such data be deleted, or when the UA detects that it has limited storage space, or for security reasons. User agents should always avoid deleting data while a script that could access that data is running. When a top-level browsing context is destroyed (and therefore permanently inaccessible to the user) the data stored in its session storage areas can be discarded with it, as the API described in this specification provides no way for that data to ever be subsequently retrieved.

The lifetime of a browsing context can be unrelated to the lifetime of the actual user agent process itself, as the user agent may support resuming sessions after a restart.

When a new HTMLDocument is created, the user agent must check to see if the document's top-level browsing context has allocated a session storage area for that document's domain. If it has not, a new storage area for that document's domain must be created.

The Storage object for the document's associated Window object's sessionStorage attribute must then be associated with the domain's session storage area.

When a new top-level browsing context is created by cloning an existing browsing context, the new browsing context must start with the same session storage areas as the original, but the two sets must from that point on be considered separate, not affecting each other in any way.

When a new top-level browsing context is created by a script in an existing browsing context, or by the user following a link in an existing browsing context, or in some other way related to a specific HTMLDocument, then, if the new context's first HTMLDocument has the same domain as the HTMLDocument from which the new context was created, the new browsing context must start with a single session storage area. That storage area must be a copy of that domain's session storage area in the original browsing context, which from that point on must be considered separate, with the two storage areas not affecting each other in any way.

When the setItem() method is called on a Storage object x that is associated with a session storage area, then, if the method does not raise a security exception, in every HTMLDocument object whose Window object's sessionStorage attribute's Storage object is associated with the same storage area, other than x, a storage event must be fired, as described below.

4.10.5. The globalStorage attribute

interface StorageList {
  Storage namedItem(in DOMString domain);
};

The globalStorage object provides a Storage object for each domain.

In the ECMAScript DOM binding, StorageList objects must support dereferencing such that getting a property that is not a member of the object (i.e. is neither a member of the StorageList interface nor of Object) must invoke the namedItem() method with the property's name as the argument.

User agents must have a set of global storage areas, one for each domain.

User agents should only expire data from the global storage areas for security reasons or when requested to do so by the user. User agents should always avoid deleting data while a script that could access that data is running. Data stored in global storage areas should be considered potentially user-critical. It is expected that Web applications will use the global storage areas for storing user-written documents.

The namedItem(domain) method tries to returns a Storage object associated with the given domain, according to the rules that follow.

The domain must first be split into an array of strings, by splitting the string at "." characters (U+002E FULL STOP). If the domain argument is the empty string, then the array is empty as well. If the domain argument is not empty but has no dots, then the array has one item, which is equal to the domain argument. If the domain argument contains consecutive dots, there will be empty strings in the array (e.g. the string "hello..world" becomes split into the three strings "hello", "", and "world", with the middle one being the empty string).

Each component of the array must then have the IDNA ToASCII algorithm applied to it, with both the AllowUnassigned and UseSTD3ASCIIRules flags set. [RFC3490] If ToASCII fails to convert one of the components of the string, e.g. because it is too long or because it contains invalid characters, then the user agent must raise a SYNTAX_ERR exception. [DOM3CORE] The components after this step consist of only US-ASCII characters.

The components of the array must then be converted to lowercase. Since only US-ASCII is involved at this step, this only requires converting characters in the range A-Z to the corresponding characters in the range a-z.

The resulting array is used in a comparison with another array, as described below. In addition, its components are concatenated together, each part separated by a dot (U+002E), to form the normalised requested domain.

If the original domain was "Åsgård.Example.Com", then the resulting array would have the three items "xn--sgrd-poac", "example", and "com", and the normalised requested domain would be "xn--sgrd-poac.example.com".

Next, the domain part of the tuple forming the calling script's origin is processed to find if it is allowed to access the requested domain.

If the script's origin has no domain part, e.g. if only the server's IP address is known, and the normalised requested domain is not the empty string, then the user agent must raise a security exception.

If the normalised requested domain is the empty string, then the rest of this algorithm can be skipped. This is because in that situation, the comparison of the two arrays below will always find them to be the same — the first array in such a situation is also empty and so permission to access that storage area will always be given.

If the domain part of the script's origin contains no dots (U+002E) then the string ".localdomain" must be appended to the script's domain.

Then, the domain part of the script's origin must be turned into an array, being split, converted to ASCII, and lowercased as described for the domain argument above.

Of the two arrays, the longest one must then be shortened to the length of the shorter one, by dropping items from the start of the array.

If the domain argument is "www.example.com" and the script origin's domain part is "example.com" then the first array will be a three item array ("www", "example", "com"), and the second will be a two item array ("example", "com"). The first array is therefore shortened, dropping the leading parts, making both into the same array ("example", "com").

If the two arrays are not component-for-component identical in literal string comparisons, then the user agent must then raise a security exception.

Otherwise, the user agent must check to see if it has allocated global storage area for the normalised requested domain. If it has not, a new storage area for that domain must be created.

The user agent must then create a Storage object associated with that domain's global storage area, and return it.

When the requested domain is a top level domain, or the empty string, or a country-specific sub-domain like "co.uk" or "ca.us", the associated global storage area is known as public storage area

The setItem() method might be called on a Storage object that is associated with a global storage area for a domain d, created by a StorageList object associated with a Window object x. Whenever this occurs, if the method didn't raise an exception, a storage event must be fired, as described below, in every HTMLDocument object that matches the following conditions:

In other words, every other document that has access to that domain's global storage area is notified of the change.

4.10.6. The storage event

The storage event is fired in an HTMLDocument when a storage area changes, as described in the previous two sections (for session storage, for global storage).

When this happens, a storage event in no namespace, which bubbles, is not cancelable, has no default action, and which uses the StorageEvent interface described below, must be fired on the body element.

However, it is possible (indeed, for session storage areas, likely) that the target HTMLDocument object is not active at that time. For example, it might not be the current entry in the session history; user agents typically stop scripts from running in pages that are in the history. In such cases, the user agent must instead delay the firing of the event until such time as the HTMLDocument object in question becomes active again.

When there are multiple delayed storage events for the same HTMLDocument object, user agents should coalesce events with the same domain value (dropping duplicates).

If the DOM of a page that has delayed storage events queued up is discarded, then the delayed events are dropped as well.

interface StorageEvent : Event {
  readonly attribute DOMString domain;
  void initStorageEvent(in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMString domainArg);
  void initStorageEventNS(in DOMString namespaceURIArg, in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in DOMString domainArg);
};

The initStorageEvent() and initStorageEventNS() methods must initialise the event in a manner analogous to the similarly-named methods in the DOM3 Events interfaces. [DOM3EVENTS]

The domain attribute of the StorageEvent event object must be set to the name of the domain associated with the storage area that changed if that storage area is a global storage area, or the string "#session" if it was a session storage area.

4.10.7. Miscellaneous implementation requirements for storage areas

4.10.7.1. Disk space

User agents should limit the total amount of space allowed for a domain based on the domain of the page setting the value.

User agents should not limit the total amount of space allowed on a per-storage-area basis, otherwise a site could just store data in any number of subdomains, e.g. storing up to the limit in a1.example.com, a2.example.com, a3.example.com, etc, circumventing per-domain limits.

User agents should consider additional quota mechanisms (for example limiting the amount of space provided to a domain's subdomains as a group) so that hostile authors can't run scripts from multiple subdomains all adding data to the global storage area in an attempted denial-of-service attack.

User agents may prompt the user when per-domain space quotas are reached, allowing the user to grant a site more space. This enables sites to store many user-created documents on the user's computer, for instance.

User agents should allow users to see how much space each domain is using.

If the storage area space limit is reached during a setItem() call, the user agent should raise an exception.

A mostly arbitrary limit of five megabytes per domain is recommended. Implementation feedback is welcome and will be used to update this suggestion in future.

4.10.7.2. Threads

Multiple browsing contexts must be able to access the global storage areas simultaneously in a predictable manner. Scripts must not be able to detect any concurrent script execution.

This is required to guarentee that the length attribute of a Storage object never changes while a script is executing, other than in a way that is predictable by the script itself.

There are various ways of implementing this requirement. One is that if a script running in one browsing context accesses a global storage area, the UA blocks scripts in other browsing contexts when they try to access any global storage area until the first script has executed to completion. (Similarly, when a script in one browsing context accesses its session storage area, any scripts that have the same top level browsing context and the same domain would block when accessing their session storage area until the first script has executed to completion.) Another (potentially more efficient but probably more complex) implementation strategy is to use optimistic transactional script execution. This specification does not require any particular implementation strategy, so long as the requirement above is met.

4.10.8. Security and privacy

4.10.8.1. User tracking

A third-party advertiser (or any entity capable of getting content distributed to multiple sites) could use a unique identifier stored in its domain's global storage area to track a user across multiple sessions, building a profile of the user's interests to allow for highly targeted advertising. In conjunction with a site that is aware of the user's real identity (for example an e-commerce site that requires authenticated credentials), this could allow oppressive groups to target individuals with greater accuracy than in a world with purely anonymous Web usage.

The globalStorage object also introduces a way for sites to cooperate to track users over multiple domains, by storing identifying data in "public" top-level domain storage area, accessible by any domain.

There are a number of techniques that can be used to mitigate the risk of user tracking:

While these suggestions prevent trivial use of this API for user tracking, they do not block it altogether. Within a single domain, a site can continue to track the user across multiple sessions, and can then pass all this information to the third party along with any identifying information (names, credit card numbers, addresses) obtained by the site. If a third party cooperates with multiple sites to obtain such information, a profile can still be created.

However, user tracking is to some extent possible even with no cooperation from the user agent whatsoever, for instance by using session identifiers in URIs, a technique already commonly used for innocuous purposes but easily repurposed for user tracking (even retroactively). This information can then be shared with other sites, using using visitors' IP addresses and other user-specific data (e.g. user-agent headers and configuration settings) to combine separate sessions into coherent user profiles.

If the user interface for persistent storage presents data in the persistent storage feature separately from data in HTTP session cookies, then users are likely to delete data in one and not the other. This would allow sites to use the two features as redundant backup for each other, defeating a user's attempts to protect his privacy.

4.10.8.3. Integrity of "public" storage areas

Since the "public" global storage areas are accessible by content from many different parties, it is possible for third-party sites to delete or change information stored in those areas in ways that the originating sites may not expect.

Authors must not use the "public" global storage areas for storing sensitive data. Authors must not trust information stored in "public" global storage areas.

4.10.8.4. Cross-protocol and cross-port attacks

This API makes no distinction between content served over HTTP, FTP, or other host-based protocols, and does not distinguish between content served from different ports at the same host.

Thus, for example, data stored in the global persistent storage for domain "www.example.com" by a page served from HTTP port 80 will be available to a page served in http://example.com:18080/, even if the latter is an experimental server under the control of a different user.

Since the data is not sent over the wire by the user agent, this is not a security risk in its own right. However, authors must take proper steps to ensure that all hosts that have fully qualified host names that are subsets of hosts dealing with sensitive information are as secure as the originating hosts themselves.

Similarly, authors must ensure that all Web servers on a host, regardless of the port, are equally trusted if any of them are to use persistent storage. For instance, if a Web server runs a production service that makes use of the persistent storage feature, then other users that have access to that machine and that can run a Web server on another port will be able to access the persistent storage added by the production service (assuming they can trick a user into visiting their page).

However, if one is able to trick users into visiting a Web server with the same host name but on a different port as a production service used by these users, then one could just as easily fake the look of the site and thus trick users into authenticating with the fake site directly, forwarding the request to the real site and stealing the credentials in the process. Thus, the persistent storage feature is considered to only minimally increase the risk involved.

What about if someone is able to get a server up on a port, and can then send people to that URI? They could steal all the data with no further interaction. How about putting the port number at the end of the string being compared? (Implicitly.)

4.10.8.5. DNS spoofing attacks

Because of the potential for DNS spoofing attacks, one cannot guarentee that a host claiming to be in a certain domain really is from that domain. The secure attribute is provided to mark certain key/value pairs as only being accessible to pages that have been authenticated using secure certificates (or similar mechanisms).

Authors must ensure that they do not mark sensitive items as "safe for both secure and insecure content". (To prevent the risk of a race condition, data stored by scripts in secure contexts default to being marked as "safe only for secure content".)

4.10.8.6. Cross-directory attacks

Different authors sharing one host name, for example users hosting content on geocities.com, all share one persistent storage object. There is no feature to restrict the access by pathname. Authors on shared hosts are therefore recommended to avoid using the persistent storage feature, as it would be trivial for other authors to read from and write to the same storage area.

Even if a path-restriction feature was made available, the usual DOM scripting security model would make it trivial to bypass this protection and access the data from any path.

4.10.8.7. Public storage areas corresponding to hosts

If a "public" global storage area corresponds to a host, as it typically does if for private domains with third-party subdomains such as dyndns.org or uk.com, the host corresponding to the "public" domain has access to all the storage areas of its third-party subdomains. In general, authors are discouraged from using the globalStorage API for sensitive data unless the operators of all the domains involved are trusted.

User agents may mitigate this problem by preventing hosts corresponding to "public" global storage areas from accessing any storage areas other than their own.

4.10.8.8. Storage areas in the face of untrusted higher-level domains that do not correspond to public storage areas

Authors should not store sensitive data using the global storage APIs if there are hosts with fully-qualified domain names that are subsets of their own which they do not trust. For example, an author at finance.members.example.net should not store sensitive financial user data in the finance.members.example.net storage area if he does not trust the host that runs example.net.

4.10.8.9. Storage areas in the face of untrusted subdomains

If an author publishing content on one host, e.g. example.com, wishes to use the globalStorage API but does not wish any content on the host's subdomains to access the data, the author should use an otherwise non-existent subdomain name, e.g., private.example.com, to store the data. This will be accessible only to that host (and its parent domains), and not to any of the real subdomains (e.g. upload.example.com).

4.10.8.10. Implementation risks

The two primary risks when implementing this persistent storage feature are letting hostile sites read information from other domains, and letting hostile sites write information that is then read from other domains.

Letting third-party sites read data that is not supposed to be read from their domain causes information leakage, For example, a user's shopping wishlist on one domain could be used by another domain for targeted advertising; or a user's work-in-progress confidential documents stored by a word-processing site could be examined by the site of a competing company.

Letting third-party sites write data to the storage areas of other domains can result in information spoofing, which is equally dangerous. For example, a hostile site could add items to a user's wishlist; or a hostile site could set a user's session identifier to a known ID that the hostile site can then use to track the user's actions on the victim site.

A risk is also presented by servers on local domains having host names matching top-level domain names, for instance having a host called "com" or "net". Such hosts might, if implementations fail to correctly implement the .localdomain suffixing, have full access to all the data stored in a UA's persistent storage for that top level domain.

Thus, strictly following the model described in this specification is important for user security.

In addition, a number of optional restrictions related to the "public" global storage areas are suggested in the previous sections. The design of this API is intended to be such that not supporting these restrictions, or supporting them less than perfectly, does not result in critical security problems. However, implementations are still encouraged to create and maintain a list of "public" domains, and apply the restrictions described above.