This is a snapshot of an early working draft and has therefore been superseded by the HTML standard.

This document will not be further updated.

HTML 5

Call For Comments — 27 October 2007

8.5. Parsing HTML fragments

The following steps form the HTML fragment parsing algorithm. The algorithm takes as input a DOM Element, referred to as context, which gives the context for the parser, as well as input, a string to parse, and returns a list of zero or more nodes.

Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm. The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.

  1. Create a new Document node, and mark it as being an HTML document.

  2. Create a new HTML parser, and associate it with the just created Document node.

  3. Set the HTML parser's tokenisation stage's content model flag according to the context element, as follows:

    If it is a title or textarea element
    Set the content model flag to RCDATA.
    If it is a style, script, xmp, iframe, noembed, or noframes element
    Set the content model flag to CDATA.
    If it is a noscript element
    If scripting is enabled, set the content model flag to CDATA. Otherwise, set the content model flag to PCDATA.
    If it is a plaintext element
    Set the content model flag to PLAINTEXT.
    Otherwise
    Set the content model flag to PCDATA.
  4. Switch the HTML parser's tree construction stage to the main phase.

  5. Let root be a new html element with no attributes.

  6. Append the element root to the Document node created above.

  7. Set up the parser's stack of open elements so that it contains just the single element root.

  8. Reset the parser's insertion mode appropriately.

    The parser will reference the context node as part of that algorithm.

  9. Set the parser's form element pointer to the nearest node to the context that is a form element (going straight up the ancestor chain, and including the element itself, if it is a form element), or, if there is no such form element, to null.

  10. Place into the input stream for the HTML parser just created the input.

  11. Start the parser and let it run until it has consumed all the characters just inserted into the input stream.

  12. Return all the child nodes of root, preserving the document order.