3 Semantics, structure, and APIs of HTML documents

3.1 Documents

Every XML and HTML document in an HTML UA is represented by a Document object. [DOM]

The document's address is the URL associated with a Document (as defined in the DOM standard). It is initially set when the Document is created, but that can change during the lifetime of the Document; for example, it changes when the user navigates to a fragment identifier on the page and when the pushState() method is called with a new URL. [DOM]

Interactive user agents typically expose the document's address in their user interface. This is the primary mechanism by which a user can tell if a site is attempting to impersonate another.

When a Document is created by a script using the createDocument() or createHTMLDocument() APIs, the document's address is the same as the document's address of the responsible document specified by the script's settings object, and the Document is both ready for post-load tasks and completely loaded immediately.

The document's referrer is an absolute URL that can be set when the Document is created. If it is not explicitly set, then its value is the empty string.

Each Document object has a reload override flag that is originally unset. The flag is set by the document.open() and document.write() methods in certain situations. When the flag is set, the Document also has a reload override buffer which is a Unicode string that is used as the source of the document when it is reloaded.

When the user agent is to perform an overridden reload, given a source browsing context, it must act as follows:

  1. Let source be the value of the browsing context's active document's reload override buffer.

  2. Let address be the browsing context's active document's address.

  3. Navigate the browsing context to a resource whose source is source, with replacement enabled and exceptions enabled. The source browsing context is that given to the overridden reload algorithm. When the navigate algorithm creates a Document object for this purpose, set that Document's reload override flag and set its reload override buffer to source.

    When it comes time to set the document's address in the navigation algorithm, use address as the override URL.

3.1.1 The Document object

The DOM specification defines a Document interface, which this specification extends significantly:

enum DocumentReadyState { "loading", "interactive", "complete" };

[OverrideBuiltins]
partial /*sealed*/ interface Document {
  // resource metadata management
  [PutForwards=href, Unforgeable] readonly attribute Location? location;
           attribute DOMString domain;
  readonly attribute DOMString referrer;
           attribute DOMString cookie;
  readonly attribute DOMString lastModified;
  readonly attribute DocumentReadyState readyState;

  // DOM tree accessors
  getter object (DOMString name);
           attribute DOMString title;
           attribute DOMString dir;
           attribute HTMLElement? body;
  readonly attribute HTMLHeadElement? head;
  readonly attribute HTMLCollection images;
  readonly attribute HTMLCollection embeds;
  readonly attribute HTMLCollection plugins;
  readonly attribute HTMLCollection links;
  readonly attribute HTMLCollection forms;
  readonly attribute HTMLCollection scripts;
  NodeList getElementsByName(DOMString elementName);
  NodeList getItems(optional DOMString typeNames = ""); // microdata
  readonly attribute DOMElementMap cssElementMap;
  readonly attribute HTMLScriptElement? currentScript;

  // dynamic markup insertion
  Document open(optional DOMString type = "text/html", optional DOMString replace = "");
  WindowProxy open(DOMString url, DOMString name, DOMString features, optional boolean replace = false);
  void close();
  void write(DOMString... text);
  void writeln(DOMString... text);

  // user interaction
  readonly attribute WindowProxy? defaultView;
  readonly attribute Element? activeElement;
  boolean hasFocus();
           attribute DOMString designMode;
  boolean execCommand(DOMString commandId, optional boolean showUI = false, optional DOMString value = "");
  boolean queryCommandEnabled(DOMString commandId);
  boolean queryCommandIndeterm(DOMString commandId);
  boolean queryCommandState(DOMString commandId);
  boolean queryCommandSupported(DOMString commandId);
  DOMString queryCommandValue(DOMString commandId);
  readonly attribute HTMLCollection commands;

  // special event handler IDL attributes that only apply to Document objects
  [LenientThis] attribute EventHandler onreadystatechange;

  // also has obsolete members
};
Document implements GlobalEventHandlers;

3.1.2 Resource metadata management

document . referrer

Returns the address of the Document from which the user navigated to this one, unless it was blocked or there was no such document, in which case it returns the empty string.

The noreferrer link type can be used to block the referrer.

The referrer attribute must return the document's referrer.

In the case of HTTP, the referrer IDL attribute will match the Referer (sic) header that was sent when fetching the current page.

Typically user agents are configured to not report referrers in the case where the referrer uses an encrypted protocol and the current page does not (e.g. when navigating from an https: page to an http: page).


document . cookie [ = value ]

Returns the HTTP cookies that apply to the Document. If there are no cookies or cookies can't be applied to this resource, the empty string will be returned.

Can be set, to add a new cookie to the element's set of HTTP cookies.

If the contents are sandboxed into a unique origin (e.g. in an iframe with the sandbox attribute), a SecurityError exception will be thrown on getting and setting.

The cookie attribute represents the cookies of the resource identified by the document's address.

A Document object that falls into one of the following conditions is a cookie-averse Document object:

On getting, if the document is a cookie-averse Document object, then the user agent must return the empty string. Otherwise, if the Document's origin is not a scheme/host/port tuple, the user agent must throw a SecurityError exception. Otherwise, the user agent must first obtain the storage mutex and then return the cookie-string for the document's address for a "non-HTTP" API, decoded using the UTF-8 decoder. [COOKIES] (This is a fingerprinting vector.)

On setting, if the document is a cookie-averse Document object, then the user agent must do nothing. Otherwise, if the Document's origin is not a scheme/host/port tuple, the user agent must throw a SecurityError exception. Otherwise, the user agent must obtain the storage mutex and then act as it would when receiving a set-cookie-string for the document's address via a "non-HTTP" API, consisting of the new value encoded as UTF-8. [COOKIES] [ENCODING]

Since the cookie attribute is accessible across frames, the path restrictions on cookies are only a tool to help manage which cookies are sent to which parts of the site, and are not in any way a security feature.


document . lastModified

Returns the date of the last modification to the document, as reported by the server, in the form "MM/DD/YYYY hh:mm:ss", in the user's local time zone.

If the last modification date is not known, the current time is returned instead.

The lastModified attribute, on getting, must return the date and time of the Document's source file's last modification, in the user's local time zone, in the following format:

  1. The month component of the date.
  2. A U+002F SOLIDUS character (/).
  3. The day component of the date.
  4. A U+002F SOLIDUS character (/).
  5. The year component of the date.
  6. A U+0020 SPACE character.
  7. The hours component of the time.
  8. A U+003A COLON character (:).
  9. The minutes component of the time.
  10. A U+003A COLON character (:).
  11. The seconds component of the time.

All the numeric components above, other than the year, must be given as two ASCII digits representing the number in base ten, zero-padded if necessary. The year must be given as the shortest possible string of four or more ASCII digits representing the number in base ten, zero-padded if necessary.

The Document's source file's last modification date and time must be derived from relevant features of the networking protocols used, e.g. from the value of the HTTP Last-Modified header of the document, or from metadata in the file system for local files. If the last modification date and time are not known, the attribute must return the current date and time in the above format.


document . readyState

Returns "loading" while the Document is loading, "interactive" once it is finished parsing but still loading sub-resources, and "complete" once it has loaded.

The readystatechange event fires on the Document object when this value changes.

Each document has a current document readiness. When a Document object is created, it must have its current document readiness set to the string "loading" if the document is associated with an HTML parser, an XML parser, or an XSLT processor, and to the string "complete" otherwise. Various algorithms during page loading affect this value. When the value is set, the user agent must fire a simple event named readystatechange at the Document object.

A Document is said to have an active parser if it is associated with an HTML parser or an XML parser that has not yet been stopped or aborted.

The readyState IDL attribute must, on getting, return the current document readiness.

3.1.3 DOM tree accessors

The html element of a document is the document's root element, if there is one and it's an html element, or null otherwise.


document . head

Returns the head element.

The head element of a document is the first head element that is a child of the html element, if there is one, or null otherwise.

The head attribute, on getting, must return the head element of the document (a head element or null).


document . title [ = value ]

Returns the document's title, as given by the title element for HTML and as given by the SVG title element for SVG.

Can be set, to update the document's title. If there is no appropriate element to update, the new value is ignored.

The title element of a document is the first title element in the document (in tree order), if there is one, or null otherwise.

The title attribute must, on getting, run the following algorithm:

  1. If the root element is an svg element in the SVG namespace, then let value be a concatenation of the data of all the child Text nodes of the first title element in the SVG namespace that is a child of the root element. [SVG]

  2. Otherwise, let value be a concatenation of the data of all the child Text nodes of the title element, in tree order, or the empty string if the title element is null.

  3. Strip and collapse whitespace in value.

  4. Return value.

On setting, the steps corresponding to the first matching condition in the following list must be run:

If the root element is an svg element in the SVG namespace [SVG]
  1. Let element be the first title element in the SVG namespace that is a child of the root element, if any. If there isn't one, create a title element in the SVG namespace, append it to the root element, and let element be that element. [SVG]

  2. Act as if the textContent IDL attribute of element was set to the new value being assigned.

If the root element is in the HTML namespace
  1. If the title element is null and the head element is null, then abort these steps.

  2. If the title element is null, then create a new title element and append it to the head element, and let element be the newly created element; otherwise, let element be the title element.

  3. Act as if the textContent IDL attribute of element was set to the new value being assigned.

Otherwise

Do nothing.


document . body [ = value ]

Returns the body element.

Can be set, to replace the body element.

If the new value is not a body or frameset element, this will throw a HierarchyRequestError exception.

The body element of a document is the first child of the html element that is either a body element or a frameset element. If there is no such element, it is null.

The body attribute, on getting, must return the body element of the document (either a body element, a frameset element, or null). On setting, the following algorithm must be run:

  1. If the new value is not a body or frameset element, then throw a HierarchyRequestError exception and abort these steps.
  2. Otherwise, if the new value is the same as the body element, do nothing. Abort these steps.
  3. Otherwise, if the body element is not null, then replace that element with the new value in the DOM, as if the root element's replaceChild() method had been called with the new value and the incumbent body element as its two arguments respectively, then abort these steps.
  4. Otherwise, if there is no root element, throw a HierarchyRequestError exception and abort these steps.
  5. Otherwise, the body element is null, but there's a root element. Append the new value to the root element.

document . images

Returns an HTMLCollection of the img elements in the Document.

document . embeds
document . plugins

Return an HTMLCollection of the embed elements in the Document.

document . links

Returns an HTMLCollection of the a and area elements in the Document that have href attributes.

document . forms

Return an HTMLCollection of the form elements in the Document.

document . scripts

Return an HTMLCollection of the script elements in the Document.

The images attribute must return an HTMLCollection rooted at the Document node, whose filter matches only img elements.

The embeds attribute must return an HTMLCollection rooted at the Document node, whose filter matches only embed elements.

The plugins attribute must return the same object as that returned by the embeds attribute.

The links attribute must return an HTMLCollection rooted at the Document node, whose filter matches only a elements with href attributes and area elements with href attributes.

The forms attribute must return an HTMLCollection rooted at the Document node, whose filter matches only form elements.

The scripts attribute must return an HTMLCollection rooted at the Document node, whose filter matches only script elements.


collection = document . getElementsByName(name)

Returns a NodeList of elements in the Document that have a name attribute with the value name.

The getElementsByName(name) method takes a string name, and must return a live NodeList containing all the HTML elements in that document that have a name attribute whose value is equal to the name argument (in a case-sensitive manner), in tree order. When the method is invoked on a Document object again with the same argument, the user agent may return the same as the object returned by the earlier call. In other cases, a new NodeList object must be returned.


element . cssElementMap

Returns a DOMElementMap object for the Document representing the current CSS element reference identifiers.

The cssElementMap IDL attribute allows authors to define CSS element reference identifiers, which are used in certain CSS features to override the normal ID-based mapping. [CSSIMAGES]

When a Document is created, it must be associated with an initially-empty CSS ID overrides list, which consists of a list of mappings each of which consists of a string name mapped to an Element node.

Each entry in the CSS ID overrides list, while it is in the list and is either in the Document or is an img, video, or canvas element, defines a CSS element reference identifier mapping the given name to the given Element. [CSSIMAGES]

On getting, the cssElementMap IDL attribute must return a DOMElementMap object, associated with the following algorithms, which expose the current mappings:

The algorithm for getting the list of name-element mappings

Return the Document's CSS ID overrides list, maintaining the order in which the entries were originally added to the list.

The algorithm for mapping a name to a certain element

Let name be the name passed to the algorithm and element be the Element passed to the algorithm.

If element is null, run the algorithm for deleting mappings by name, passing it name.

Otherwise, if there is an entry in the Document's CSS ID overrides list whose name is name, replace its current value with element.

Otherwise, add a mapping to the Document's CSS ID overrides list whose name is name and whose element is element.

The algorithm for deleting mappings by name

If there is an entry in the Document's CSS ID overrides list whose name is the name passed to this algorithm, remove it. This also undefines the CSS element reference identifier for that name. [CSSIMAGES]

The same object must be returned each time.


document . currentScript

Returns the script element that is currently executing. In the case of reentrant script execution, returns the one that most recently started executing amongst those that have not yet finished executing.

Returns null if the Document is not currently executing a script element (e.g. because the running script is an event handler, or a timeout).

The currentScript attribute, on getting, must return the value to which it was most recently initialized. When the Document is created, the currentScript must be initialized to null.


The Document interface supports named properties. The supported property names at any moment consist of the values of the name content attributes of all the applet, exposed embed, form, iframe, img, and exposed object elements in the Document that have non-empty name content attributes, and the values of the id content attributes of all the applet and exposed object elements in the Document that have non-empty id content attributes, and the values of the id content attributes of all the img elements in the Document that have both non-empty name content attributes and non-empty id content attributes. The supported property names must be in tree order, ignoring later duplicates, with values from id attributes coming before values from name attributes when the same element contributes both.

To determine the value of a named property name when the Document object is indexed for property retrieval, the user agent must return the value obtained using the following steps:

  1. Let elements be the list of named elements with the name name in the Document.

    There will be at least one such element, by definition.

  2. If elements has only one element, and that element is an iframe element, then return the WindowProxy object of the nested browsing context represented by that iframe element, and abort these steps.

  3. Otherwise, if elements has only one element, return that element and abort these steps.

  4. Otherwise return an HTMLCollection rooted at the Document node, whose filter matches only named elements with the name name.

Named elements with the name name, for the purposes of the above algorithm, are those that are either:

An embed or object element is said to be exposed if it has no exposed object ancestor, and, for object elements, is additionally either not showing its fallback content or has no object or embed descendants.


The dir attribute on the Document interface is defined along with the dir content attribute.

3.1.4 Loading XML documents

partial interface XMLDocument {
  boolean load(DOMString url);
};

The load(url) method must run the following steps:

  1. Let document be the XMLDocument object on which the method was invoked.

  2. Resolve the method's first argument, relative to the API base URL specified by the entry settings object. If this is not successful, throw a SyntaxError exception and abort these steps. Otherwise, let url be the resulting absolute URL.

  3. If the origin of url is not the same as the origin of document, throw a SecurityError exception and abort these steps.

  4. Remove all child nodes of document, without firing any mutation events.

  5. Set the current document readiness of document to "loading".

  6. Run the remainder of these steps asynchronously, and return true from the method.

  7. Let result be a Document object.

  8. Let success be false.

  9. Fetch url from the origin of document, using the API referrer source specified by the entry settings object, with the synchronous flag set and the force same-origin flag set.

  10. If the fetch attempt was successful, and the resource's Content-Type metadata is an XML MIME type, then run these substeps:

    1. Create a new XML parser associated with the result document.

    2. Pass this parser the fetched document.

    3. If there is an XML well-formedness or XML namespace well-formedness error, then remove all child nodes from result. Otherwise let success be true.

  11. Queue a task to run the following steps.

    1. Set the current document readiness of document to "complete".

    2. Replace all the children of document by the children of result (even if it has no children), firing mutation events as if a DocumentFragment containing the new children had been inserted.

    3. Fire a simple event named load at document.