This is a snapshot of an early working draft and has therefore been superseded by the HTML standard.

This document will not be further updated.

HTML 5

Call For Comments — 27 October 2007

8.4. Serialising HTML fragments

The following steps form the HTML fragment serialisation algorithm. The algorithm takes as input a DOM Element or Document, referred to as the node, and either returns a string or raises an exception.

This algorithm serialises the children of the node being serialised, not the node itself.

  1. Let s be a string, and initialise it to the empty string.

  2. For each child node child of the node, in tree order, append the appropriate string from the following list to s:

    If the child node is an Element

    Append a U+003C LESS-THAN SIGN (<) character, followed by the element's tag name. (For nodes created by the HTML parser, Document.createElement(), or Document.renameNode(), the tag name will be lowercase.)

    For each attribute that the element has, append a U+0020 SPACE character, the attribute's name (which, for attributes set by the HTML parser or by Element.setAttributeNode() or Element.setAttribute(), will be lowercase), a U+003D EQUALS SIGN (=) character, a U+0022 QUOTATION MARK (") character, the attribute's value, escaped as described below, and a second U+0022 QUOTATION MARK (") character.

    While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialise an element's attributes in the same order.

    Append a U+003E GREATER-THAN SIGN (>) character.

    If the child node is an area, base, basefont, bgsound, br, col, embed, frame, hr, img, input, link, meta, param, spacer, or wbr element, then continue on to the next child node at this point.

    If the child node is a pre or textarea element, append a U+000A LINE FEED (LF) character.

    Append the value of running the HTML fragment serialisation algorithm on the child element (thus recursing into this algorithm for that element), followed by a U+003C LESS-THAN SIGN (<) character, a U+002F SOLIDUS (/) character, the element's tag name again, and finally a U+003E GREATER-THAN SIGN (>) character.

    If the child node is a Text or CDATASection node

    If one of the ancestors of the child node is a style, script, xmp, iframe, noembed, noframes, noscript, or plaintext element, then append the value of the child node's data DOM attribute literally.

    Otherwise, append the value of the child node's data DOM attribute, escaped as described below.

    If the child node is a Comment

    Append the literal string <!-- (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), followed by the value of the child node's data DOM attribute, followed by the literal string --> (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).

    If the child node is a DocumentType

    Append the literal string <!DOCTYPE (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a space (U+0020 SPACE), followed by the value of the child node's name DOM attribute, followed by the literal string > (U+003E GREATER-THAN SIGN).

    Other nodes types (e.g. Attr) cannot occur as children of elements. If they do, this algorithm must raise an INVALID_STATE_ERR exception.

  3. The result of the algorithm is the string s.

Escaping a string (for the purposes of the algorithm above) consists of replacing any occurances of the "&" character by the string "&amp;", any occurances of the "<" character by the string "&lt;", any occurances of the ">" character by the string "&gt;", and any occurances of the """ character by the string "&quot;".

Entity reference nodes are assumed to be expanded by the user agent, and are therefore not covered in the algorithm above.

It is possible that the output of this algorithm, if parsed with an HTML parser, will not return the original tree structure. For instance, if a textarea element to which a Comment node has been appended is serialised and the output is then reparsed, the comment will end up being displayed in the text field. Similarly, if, as a result of DOM manipulation, an element contains a comment that contains the literal string "-->", then when the result of serialising the element is parsed, the comment will be truncated at that point and the rest of the comment will be interpreted as markup. More examples would be making a script element contain a text node with the text string "</script>", or having a p element that contains a ul element (as the ul element's start tag would imply the end tag for the p).