Elements, attributes, and attribute values in HTML are defined (by this
specification) to have certain meanings (semantics). For example, the
ol element represents an ordered list, and
the lang attribute represents the language of the
content.
Authors must only use elements, attributes, and attribute values for their appropriate semantic purposes.
For example, the following document is non-conforming, despite being syntactically correct:
<!DOCTYPE html>
<html lang="en-GB">
<head> <title> Demonstration </title> </head>
<body>
<table>
<tr> <td> My favourite animal is the cat. </td> </tr>
<tr>
<td>
—<a href="http://example.org/~ernest/"><cite>Ernest</cite></a>,
in an essay from 1992
</td>
</tr>
</table>
</body>
</html>
...because the data placed in the cells is clearly not tabular data. A corrected version of this document might be:
<!DOCTYPE html> <html lang="en-GB"> <head> <title> Demonstration </title> </head> <body> <blockquote> <p> My favourite animal is the cat. </p> </blockquote> <p> —<a href="http://example.org/~ernest/"><cite>Ernest</cite></a>, in an essay from 1992 </p> </body> </html>
This next document fragment, intended to represent the heading of a corporate site, is similarly non-conforming because the second line is not intended to be a heading of a subsection, but merely a subheading or subtitle (a subordinate heading for the same section).
<body> <h1>ABC Company</h1> <h2>Leading the way in widget design since 1432</h2> ...
The header element should be used in
these kinds of situations:
<body> <header> <h1>ABC Company</h1> <h2>Leading the way in widget design since 1432</h2> </header> ...
Through scripting and using other mechanisms, the values of attributes, text, and indeed the entire structure of the document may change dynamically while a user agent is processing it. The semantics of a document at an instant in time are those represented by the state of the document at that instant in time, and the semantics of a document can therefore change over time. User agents must update their presentation of the document as this occurs.
HTML has a progress
element that describes a progress bar. If its "value" attribute is
dynamically updated by a script, the UA would update the rendering to show
the progress changing.
All the elements in this specification have a defined content model, which describes what nodes are allowed inside the elements, and thus what the structure of an HTML document or fragment must look like. Authors must only put elements inside an element if that element allows them to be there according to its content model.
As noted in the conformance and terminology sections, for the
purposes of determining if an element matches its content model or not, CDATASection nodes in the
DOM are treated as equivalent to Text nodes, and entity reference nodes are treated as if they
were expanded in place.
The space characters are always allowed between elements. User agents represent these characters between elements in the source markup as text nodes in the DOM. Empty text nodes and text nodes consisting of just sequences of those characters are considered inter-element whitespace.
Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element matches its content model or not, and must be ignored when following algorithms that define document and element semantics.
An element A is said to be preceeded or followed by a second element B if A and B have the same parent node and there are no other element nodes or text nodes (other than inter-element whitespace) between them.
Authors must only use elements in the HTML namespace in the contexts where they are allowed, as defined for each element. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.
The SVG specification defines the SVG foreignObject
element as allowing foreign namespaces to be included, thus allowing
compound documents to be created by inserting subdocument content under
that element. This specification defines the XHTML html element as being allowed where subdocument
fragments are allowed in a compound document. Together, these two
definitions mean that placing an XHTML html element as a child of an SVG
foreignObject element is conforming.
Each element in HTML falls into zero or more categories that group elements with similar characteristics together. This specification uses the following categories:
Some elements have unique requirements and do not fit into any particular category.
In addition, some elements represent various common concepts; for example, some elements represent paragraphs.
Block-level elements are used for structural grouping of page content.
There are several kinds of block-level elements:
blockquote, section, article, header.
p, h1-h6, address.
nav, aside,
footer, div.
ul, ol, dl, table,
script.
There are also elements that seem to be block-level but aren't, such as
body, li,
dt, dd, and
td. These elements are allowed only in
specific places, not simply anywhere that block-level elements are
allowed.
Some block-level elements play multiple roles. For instance, the
script elements is allowed inside
head elements and can also be used as inline-level content. Similarly, the ul, ol, dl, table, and
blockquote elements play dual roles
as both block-level and inline-level elements.
Inline-level content consists of text and various elements to annotate the text, as well as some embedded content (such as images or sound clips).
Inline-level content comes in various types:
a, meter,
img. Elements used in contexts allowing
only strictly inline-level content must not have any descendants that are
anything other than strictly inline-level content.
ol, blockquote, table.
Some elements are defined to have as a content model significant inline content. This means that at least one descendant of the element must be significant text or embedded content.
Unless an element's content model explicitly states that it must contain significant inline content, simply having no text nodes and no elements satisfies an element whose content model is some kind of inline content.
Significant text, for the purposes of determining the presence of significant inline content, consists of any character other than those falling in the Unicode categories Zs, Zl, Zp, Cc, and Cf. [UNICODE]
The following three paragraphs are non-conforming because their content model is not satisfied (they all count as empty).
<p></p> <p><em> </em></p> <p> <ol> <li></li> </ol> </p>
Embedded content consists of elements that
introduce content from other resources into the document, for example
img. Embedded content elements can have
fallback content: content that is to be used when
the external resource cannot be used (e.g. because it is of an unsupported
format). The element definitions state what the fallback is, if any.
Some elements are described as transparent; they have "transparent" as their content model. Some elements are described as semi-transparent; this means that part of their content model is "transparent" but that is not the only part of the content model that must be satisfied.
When a content model includes a part that is "transparent", those parts must only contain content that would still be conformant if all transparent and semi-transparent elements in the tree were replaced, in their parent element, by the children in the "transparent" part of their content model, retaining order.
When a transparent or semi-transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as zero or more block-level elements, or inline-level content (but not both).
Some elements are defined to have content models that allow either block-level elements or inline-level content, but not both. For example,
the aside and li elements.
To establish whether such an element is being used as a block-level container or as an inline-level container, for example in order to determine if a document conforms to these requirements, user agents must look at the element's child nodes. If any of the child nodes are not allowed in block-level contexts, then the element is being used for inline-level content. If all the child nodes are allowed in a block-level context, then the element is being used for block-level elements.
Whenever this search would examine a transparent element, the element's own child nodes must be examined instead, potentially recursing further if any of those are themselves transparent.
For instance, in the following (non-conforming) XML fragment, the
li element is being used as an
inline-level element container, because the meta element is not allowed in a block-level
context. (It doesn't matter, for the purposes of determining whether it
is an inline-level or block-level context, that the meta element is not allowed in inline-level
contexts either.)
<ol> <li> <p> Hello World </p> <meta title="this is an invalid example"/> </li> </ol>
In the following fragment, the aside
element is being used as a block-level container, because even though all
the elements it contains could be considered inline-level elements, there
are no nodes that can only be considered inline-level.
<aside> <ol> <li> ... </li> </ol> <ul> <li> ... </li> </ul> </aside>
On the other hand, in the following similar fragment, the aside element is an inline-level container,
because the text ("Foo") can only be considered inline-level.
<aside> <ol> <li> ... </li> </ol> Foo </aside>
Parts of this section should eventually be moved to DOM3 Events.
Certain elements in HTML can be activated, for instance a elements, button elements, or
input elements when their type attribute is set
to radio. Activation of those elements can happen in various
(UA-defined) ways, for instance via the mouse or keyboard.
When activation is performed via some method other than clicking the
pointing device, the default action of the event that triggers the
activation must, instead of being activating the element directly, be to
fire a click event on the same
element.
The default action of this click event,
or of the real click event if the element
was activated by clicking a pointing device, must be to fire a further DOMActivate event at the same
element, whose own default action is to go through all the elements the
DOMActivate event bubbled through
(starting at the target node and going towards the Document
node), looking for an element with an activation
behavior; the first element, in reverse tree order, to have one, must
have its activation behavior executed.
The above doesn't happen for arbitrary synthetic events
dispatched by author script. However, the click() method can be used to make it happen
programmatically.
For certain form controls, this process is complicated further by changes that must happen around the click event. [WF2]
Most interactive elements have content models that disallow nesting interactive elements.
A paragraph is typically a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.
Paragraphs can be represented by several elements. The address element always represents a paragraph
of contact information for its section, the aside, nav,
footer, li, and dd elements
represent paragraphs with various specific semantics when they are used as inline-level content
containers, the figure element
represents a paragraph in the form of embedded
content, and the p element represents
all the other kinds of paragraphs, for which there are no dedicated
elements.