This is a snapshot of an early working draft and has therefore been superseded by the HTML standard.

This document will not be further updated.

HTML 5

Call For Comments — 27 October 2007

3.2. Common microsyntaxes

There are various places in HTML that accept particular data types, such as dates or numbers. This section describes what the conformance criteria for content in those formats is, and how to parse them.

Need to go through the whole spec and make sure all the attribute values are clearly defined either in terms of microsyntaxes or in terms of other specs, or as "Text" or some such.

3.2.1. Common parser idioms

The space characters, for the purposes of this specification, are U+0020 SPACE, U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), and U+000D CARRIAGE RETURN (CR).

Some of the micro-parsers described below follow the pattern of having an input variable that holds the string being parsed, and having a position variable pointing at the next character to parse in input.

For parsers based on this pattern, a step that requires the user agent to collect a sequence of characters means that the following algorithm must be run, with characters being the set of characters that can be collected:

  1. Let input and position be the same variables as those of the same name in the algorithm that invoked these steps.

  2. Let result be the empty string.

  3. While position doesn't point past the end of input and the character at position is one of the characters, append that character to the end of result and advance position to the next character in input.

  4. Return result.

The step skip whitespace means that the user agent must collect a sequence of characters that are space characters. The step skip Zs characters means that the user agent must collect a sequence of characters that are in the Unicode character class Zs. In both cases, the collected characters are not used. [UNICODE]

3.2.2. Boolean attributes

A number of attributes in HTML5 are boolean attributes. The presence of a boolean attribute on an element represents the true value, and the absence of the attribute represents the false value.

If the attribute is present, its value must either be the empty string or the attribute's canonical name, exactly, with no leading or trailing whitespace, and in lowercase.

3.2.3. Numbers

3.2.3.1. Unsigned integers

A string is a valid non-negative integer if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).

The rules for parsing non-negative integers are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return zero, a positive integer, or an error. Leading spaces are ignored. Trailing spaces and indeed any trailing garbage characters are ignored.

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Let value have the value 0.

  4. Skip whitespace.

  5. If position is past the end of input, return an error.

  6. If the next character is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9), then return an error.

  7. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9):

    1. Multiply value by ten.
    2. Add the value of the current character (0..9) to value.
    3. Advance position to the next character.
    4. If position is not past the end of input, return to the top of step 7 in the overall algorithm (that's the step within which these substeps find themselves).
  8. Return value.

3.2.3.2. Signed integers

A string is a valid integer if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally prefixed with a U+002D HYPHEN-MINUS ("-") character.

The rules for parsing integers are similar to the rules for non-negative integers, and are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return an integer or an error. Leading spaces are ignored. Trailing spaces and trailing garbage characters are ignored.

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Let value have the value 0.

  4. Let sign have the value "positive".

  5. Skip whitespace.

  6. If position is past the end of input, return an error.

  7. If the character indicated by position (the first character) is a U+002D HYPHEN-MINUS ("-") character:

    1. Let sign be "negative".
    2. Advance position to the next character.
    3. If position is past the end of input, return an error.
  8. If the next character is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9), then return an error.

  9. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9):

    1. Multiply value by ten.
    2. Add the value of the current character (0..9) to value.
    3. Advance position to the next character.
    4. If position is not past the end of input, return to the top of step 9 in the overall algorithm (that's the step within which these substeps find themselves).
  10. If sign is "positive", return value, otherwise return 0-value.

3.2.3.3. Real numbers

A string is a valid floating point number if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally with a single U+002E FULL STOP (".") character somewhere (either before these numbers, in between two numbers, or after the numbers), all optionally prefixed with a U+002D HYPHEN-MINUS ("-") character.

The rules for parsing floating point number values are as given in the following algorithm. As with the previous algorithms, when this one is invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return a number or an error. Leading spaces are ignored. Trailing spaces and garbage characters are ignored.

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Let value have the value 0.

  4. Let sign have the value "positive".

  5. Skip whitespace.

  6. If position is past the end of input, return an error.

  7. If the character indicated by position (the first character) is a U+002D HYPHEN-MINUS ("-") character:

    1. Let sign be "negative".
    2. Advance position to the next character.
    3. If position is past the end of input, return an error.
  8. If the next character is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9) or U+002E FULL STOP ("."), then return an error.

  9. If the next character is U+002E FULL STOP ("."), but either that is the last character or the character after that one is not one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9), then return an error.

  10. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9):

    1. Multiply value by ten.
    2. Add the value of the current character (0..9) to value.
    3. Advance position to the next character.
    4. If position is past the end of input, then if sign is "positive", return value, otherwise return 0-value.
    5. Otherwise return to the top of step 10 in the overall algorithm (that's the step within which these substeps find themselves).
  11. Otherwise, if the next character is not a U+002E FULL STOP ("."), then if sign is "positive", return value, otherwise return 0-value.

  12. The next character is a U+002E FULL STOP ("."). Advance position to the character after that.

  13. Let divisor be 1.

  14. If the next character is one of U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9):

    1. Multiply divisor by ten.
    2. Add the value of the current character (0..9) divided by divisor, to value.
    3. Advance position to the next character.
    4. If position is past the end of input, then if sign is "positive", return value, otherwise return 0-value.
    5. Otherwise return to the top of step 14 in the overall algorithm (that's the step within which these substeps find themselves).
  15. Otherwise, if sign is "positive", return value, otherwise return 0-value.

3.2.3.4. Ratios

The algorithms described in this section are used by the progress and meter elements.

A valid denominator punctuation character is one of the characters from the table below. There is a value associated with each denominator punctuation character, as shown in the table below.

Denominator Punctuation Character Value
U+0025 PERCENT SIGN % 100
U+066A ARABIC PERCENT SIGN ٪ 100
U+FE6A SMALL PERCENT SIGN 100
U+FF05 FULLWIDTH PERCENT SIGN 100
U+2030 PER MILLE SIGN 1000
U+2031 PER TEN THOUSAND SIGN 10000

The steps for finding one or two numbers of a ratio in a string are as follows:

  1. If the string is empty, then return nothing and abort these steps.
  2. Find a number in the string according to the algorithm below, starting at the start of the string.
  3. If the sub-algorithm in step 2 returned nothing or returned an error condition, return nothing and abort these steps.
  4. Set number1 to the number returned by the sub-algorithm in step 2.
  5. Starting with the character immediately after the last one examined by the sub-algorithm in step 2, skip any characters in the string that are in the Unicode character class Zs (this might match zero characters). [UNICODE]
  6. If there are still further characters in the string, and the next character in the string is a valid denominator punctuation character, set denominator to that character.
  7. If the string contains any other characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, but denominator was given a value in the step 6, return nothing and abort these steps.
  8. Otherwise, if denominator was given a value in step 6, return number1 and denominator and abort these steps.
  9. Find a number in the string again, starting immediately after the last character that was examined by the sub-algorithm in step 2.
  10. If the sub-algorithm in step 9 returned nothing or an error condition, return nothing and abort these steps.
  11. Set number2 to the number returned by the sub-algorithm in step 9.
  12. If there are still further characters in the string, and the next character in the string is a valid denominator punctuation character, return nothing and abort these steps.
  13. If the string contains any other characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, return nothing and abort these steps.
  14. Otherwise, return number1 and number2.

The algorithm to find a number is as follows. It is given a string and a starting position, and returns either nothing, a number, or an error condition.

  1. Starting at the given starting position, ignore all characters in the given string until the first character that is either a U+002E FULL STOP or one of the ten characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE.
  2. If there are no such characters, return nothing and abort these steps.
  3. Starting with the character matched in step 1, collect all the consecutive characters that are either a U+002E FULL STOP or one of the ten characters in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, and assign this string of one or more characters to string.
  4. If string contains more than one U+002E FULL STOP character then return an error condition and abort these steps.
  5. Parse string according to the rules for parsing floating point number values, to obtain number. This step cannot fail (string is guarenteed to be a valid floating point number).
  6. Return number.
3.2.3.5. Percentages and dimensions

valid positive non-zero integers rules for parsing dimension values (only used by height/width on img, embed, object — lengths in css pixels or percentages)

3.2.3.6. Lists of integers

A valid list of integers is a number of valid integers separated by U+002C COMMA characters, with no other characters (e.g. no space characters). In addition, there might be restrictions on the number of integers that can be given, or on the range of values allowed.

The rules for parsing a list of integers are as follows:

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Let numbers be an initially empty list of integers. This list will be the result of this algorithm.

  4. If there is a character in the string input at position position, and it is either U+002C COMMA character or a U+0020 SPACE character, then advance position to the next character in input, or to beyond the end of the string if there are no more characters.

  5. If position points to beyond the end of input, return numbers and abort.

  6. If the character in the string input at position position is a U+002C COMMA character or a U+0020 SPACE character, return to step 4.

  7. Let negated be false.

  8. Let value be 0.

  9. Let multiple be 1.

  10. Let started be false.

  11. Let finished be false.

  12. Let bogus be false.

  13. Parser: If the character in the string input at position position is:

    A U+002D HYPHEN-MINUS character

    Follow these substeps:

    1. If finished is true, skip to the next step in the overall set of steps.
    2. If started is true or if bogus is true, let negated be false.
    3. Otherwise, if started is false and if bogus is false, let negated be true.
    4. Let started be true.
    A character in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE

    Follow these substeps:

    1. If finished is true, skip to the next step in the overall set of steps.
    2. Let n be the value of the digit, interpreted in base ten, multiplied by multiple.
    3. Add n to value.
    4. If value is greater than zero, multiply multiple by ten.
    5. Let started be true.
    A U+002C COMMA character
    A U+0020 SPACE character

    Follow these substeps:

    1. If started is false, return the numbers list and abort.
    2. If negated is true, then negate value.
    3. Append value to the numbers list.
    4. Jump to step 4 in the overall set of steps.
    A U+002E FULL STOP character

    Follow these substeps:

    1. Let finished be true.
    Any other character

    Follow these substeps:

    1. If finished is true, skip to the next step in the overall set of steps.
    2. Let negated be false.
    3. Let bogus be true.
    4. If started is true, then return the numbers list, and abort. (The value in value is not appended to the list first; it is dropped.)
  14. Advance position to the next character in input, or to beyond the end of the string if there are no more characters.

  15. If position points to a character (and not to beyond the end of input), jump to the big Parser step above.

  16. If negated is true, then negate value.

  17. If started is true, then append value to the numbers list, return that list, and abort.

  18. Return the numbers list and abort.

3.2.4. Dates and times

In the algorithms below, the number of days in month month of year year is: 31 if month is 1, 3, 5, 7, 8, 10, or 12; 30 if month is 4, 6, 9, or 11; 29 if month is 2 and year is a number divisible by 400, or if year is a number divisible by 4 but not by 100; and 28 otherwise. This takes into account leap years in the Gregorian calendar. [GREGORIAN]

3.2.4.1. Specific moments in time

A string is a valid datetime if it has four digits (representing the year), a literal hyphen, two digits (representing the month), a literal hyphen, two digits (representing the day), optionally some spaces, either a literal T or a space, optionally some more spaces, two digits (for the hour), a colon, two digits (the minutes), optionally the seconds (which, if included, must consist of another colon, two digits (the integer part of the seconds), and optionally a decimal point followed by one or more digits (for the fractional part of the seconds)), optionally some spaces, and finally either a literal Z (indicating the time zone is UTC), or, a plus sign or a minus sign followed by two digits, a colon, and two digits (for the sign, the hours and minutes of the timezone offset respectively); with the month-day combination being a valid date in the given year according to the Gregorian calendar, the hour values (h) being in the range 0 ≤ h ≤ 23, the minute values (m) in the range 0 ≤ m ≤ 59, and the second value (s) being in the range 0 ≤ h < 60. [GREGORIAN]

The digits must be characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), the hyphens must be a U+002D HYPHEN-MINUS characters, the T must be a U+0054 LATIN CAPITAL LETTER T, the colons must be U+003A COLON characters, the decimal point must be a U+002E FULL STOP, the Z must be a U+005A LATIN CAPITAL LETTER Z, the plus sign must be a U+002B PLUS SIGN, and the minus U+002D (same as the hyphen).

The following are some examples of dates written as valid datetimes.

"0037-12-13 00:00 Z"
Midnight UTC on the birthday of Nero (the Roman Emperor).
"1979-10-14T12:00:00.001-04:00"
One millisecond after noon on October 14th 1979, in the time zone in use on the east coast of North America during daylight saving time.
"8592-01-01 T 02:09 +02:09"
Midnight UTC on the 1st of January, 8592. The time zone associated with that time is two hours and nine minutes ahead of UTC.

Several things are notable about these dates:

Conformance checkers can use the algorithm below to determine if a datetime is a valid datetime or not.

To parse a string as a datetime value, a user agent must apply the following algorithm to the string. This will either return a time in UTC, with associated timezone information for round tripping or display purposes, or nothing, indicating the value is not a valid datetime. If at any point the algorithm says that it "fails", this means that it returns nothing.

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly four characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the year.

  4. If position is beyond the end of input or if the character at position is not a U+002D HYPHEN-MINUS character, then fail. Otherwise, move position forwards one character.

  5. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the month.

  6. If month is not a number in the range 1 ≤ month ≤ 12, then fail.
  7. Let maxday be the number of days in month month of year year.

  8. If position is beyond the end of input or if the character at position is not a U+002D HYPHEN-MINUS character, then fail. Otherwise, move position forwards one character.

  9. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the day.

  10. If day is not a number in the range 1 ≤ month ≤ maxday, then fail.

  11. Collect a sequence of characters that are either U+0054 LATIN CAPITAL LETTER T characters or space characters. If the collected sequence is zero characters long, or if it contains more than one U+0054 LATIN CAPITAL LETTER T character, then fail.

  12. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the hour.

  13. If hour is not a number in the range 0 ≤ hour ≤ 23, then fail.
  14. If position is beyond the end of input or if the character at position is not a U+003A COLON character, then fail. Otherwise, move position forwards one character.

  15. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the minute.

  16. If minute is not a number in the range 0 ≤ minute ≤ 59, then fail.
  17. Let second be a string with the value "0".

  18. If position is beyond the end of input, then fail.

  19. If the character at position is a U+003A COLON, then:

    1. Advance position to the next character in input.

    2. If position is beyond the end of input, or at the last character in input, or if the next two characters in input starting at position are not two characters both in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then fail.

    3. Collect a sequence of characters that are either characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) or U+002E FULL STOP characters. If the collected sequence has more than one U+002E FULL STOP characters, or if the last character in the sequence is a U+002E FULL STOP character, then fail. Otherwise, let the collected string be second instead of its previous value.

  20. Interpret second as a base ten number (possibly with a fractional part). Let that number be second instead of the string version.

  21. If second is not a number in the range 0 ≤ hour < 60, then fail. (The values 60 and 61 are not allowed: leap seconds cannot be represented by datetime values.)
  22. If position is beyond the end of input, then fail.

  23. Skip whitespace.

  24. If the character at position is a U+005A LATIN CAPITAL LETTER Z, then:

    1. Let timezonehours be 0.

    2. Let timezoneminutes be 0.

    3. Advance position to the next character in input.

  25. Otherwise, if the character at position is either a U+002B PLUS SIGN ("+") or a U+002D HYPHEN-MINUS ("-"), then:

    1. If the character at position is a U+002B PLUS SIGN ("+"), let sign be "positive". Otherwise, it's a U+002D HYPHEN-MINUS ("-"); let sign be "negative".

    2. Advance position to the next character in input.

    3. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the timezonehours.

    4. If timezonehours is not a number in the range 0 ≤ timezonehours ≤ 23, then fail.
    5. If sign is "negative", then negate timezonehours.
    6. If position is beyond the end of input or if the character at position is not a U+003A COLON character, then fail. Otherwise, move position forwards one character.

    7. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then fail. Otherwise, interpret the resulting sequence as a base ten integer. Let that number be the timezoneminutes.

    8. If timezoneminutes is not a number in the range 0 ≤ timezoneminutes ≤ 59, then fail.
    9. If sign is "negative", then negate timezoneminutes.
  26. If position is not beyond the end of input, then fail.

  27. Let time be the moment in time at year year, month month, day day, hours hour, minute minute, second second, subtracting timezonehours hours and timezoneminutes minutes. That moment in time is a moment in the UTC timezone.

  28. Let timezone be timezonehours hours and timezoneminutes minutes from UTC.

  29. Return time and timezone.

3.2.4.2. Vaguer moments in time

This section defines date or time strings. There are two kinds, date or time strings in content, and date or time strings in attributes. The only difference is in the handling of whitespace characters.

To parse a date or time string, user agents must use the following algorithm. A date or time string is a valid date or time string if the following algorithm, when run on the string, doesn't say the string is invalid.

The algorithm may return nothing (in which case the string will be invalid), or it may return a date, a time, a date and a time, or a date and a time and and a timezone. Even if the algorithm returns one or more values, the string can still be invalid.

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Let results be the collection of results that are to be returned (one or more of a date, a time, and a timezone), initially empty. If the algorithm aborts at any point, then whatever is currently in results must be returned as the result of the algorithm.

  4. For the "in content" variant: skip Zs characters; for the "in attributes" variant: skip whitespace.

  5. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is empty, then the string is invalid; abort these steps.

  6. Let the sequence of characters collected in the last step be s.

  7. If position is past the end of input, the string is invalid; abort these steps.

  8. If the character at position is not a U+003A COLON character, then:

    1. If the character at position is not a U+002D HYPHEN-MINUS ("-") character either, then the string is invalid, abort these steps.

    2. If the sequence s is not exactly four digits long, then the string is invalid. (This does not stop the algorithm, however.)

    3. Interpret the sequence of characters collected in step 5 as a base ten integer, and let that number be year.

    4. Advance position past the U+002D HYPHEN-MINUS ("-") character.

    5. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is empty, then the string is invalid; abort these steps.

    6. If the sequence collected in the last step is not exactly two digits long, then the string is invalid.

    7. Interpret the sequence of characters collected two steps ago as a base ten integer, and let that number be month.

    8. If month is not a number in the range 1 ≤ month ≤ 12, then the string is invalid, abort these steps.
    9. Let maxday be the number of days in month month of year year.

    10. If position is past the end of input, or if the character at position is not a U+002D HYPHEN-MINUS ("-") character, then the string is invalid, abort these steps. Otherwise, advance position to the next character.

    11. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is empty, then the string is invalid; abort these steps.

    12. If the sequence collected in the last step is not exactly two digits long, then the string is invalid.

    13. Interpret the sequence of characters collected two steps ago as a base ten integer, and let that number be day.

    14. If day is not a number in the range 1 ≤ day ≤ maxday, then the string is invalid, abort these steps.

    15. Add the date represented by year, month, and day to the results.

    16. For the "in content" variant: skip Zs characters; for the "in attributes" variant: skip whitespace.

    17. If the character at position is a U+0054 LATIN CAPITAL LETTER T, then move position forwards one character.

    18. For the "in content" variant: skip Zs characters; for the "in attributes" variant: skip whitespace.

    19. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is empty, then the string is invalid; abort these steps.

    20. Let s be the sequence of characters collected in the last step.

  9. If s is not exactly two digits long, then the string is invalid.

  10. Interpret the sequence of characters collected two steps ago as a base ten integer, and let that number be hour.

  11. If hour is not a number in the range 0 ≤ hour ≤ 23, then the string is invalid, abort these steps.

  12. If position is past the end of input, or if the character at position is not a U+003A COLON character, then the string is invalid, abort these steps. Otherwise, advance position to the next character.

  13. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is empty, then the string is invalid; abort these steps.

  14. If the sequence collected in the last step is not exactly two digits long, then the string is invalid.

  15. Interpret the sequence of characters collected two steps ago as a base ten integer, and let that number be minute.

  16. If minute is not a number in the range 0 ≤ minute ≤ 59, then the string is invalid, abort these steps.

  17. Let second be 0. It may be changed to another value in the next step.

  18. If position is not past the end of input and the character at position is a U+003A COLON character, then:

    1. Collect a sequence of characters that are either characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9) or are U+002E FULL STOP. If the collected sequence is empty, or contains more than one U+002E FULL STOP character, then the string is invalid; abort these steps.

    2. If the first character in the sequence collected in the last step is not in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then the string is invalid.

    3. Interpret the sequence of characters collected two steps ago as a base ten number (possibly with a fractional part), and let that number be second.

    4. If second is not a number in the range 0 ≤ minute < 60, then the string is invalid, abort these steps.

  19. Add the time represented by hour, minute, and second to the results.

  20. If results has both a date and a time, then:

    1. For the "in content" variant: skip Zs characters; for the "in attributes" variant: skip whitespace.

    2. If position is past the end of input, then skip to the next step in the overall set of steps.

    3. Otherwise, if the character at position is a U+005A LATIN CAPITAL LETTER Z, then:

      1. Add the timezone corresponding to UTC (zero offset) to the results.

      2. Advance position to the next character in input.

      3. Skip to the next step in the overall set of steps.

    4. Otherwise, if the character at position is either a U+002B PLUS SIGN ("+") or a U+002D HYPHEN-MINUS ("-"), then:

      1. If the character at position is a U+002B PLUS SIGN ("+"), let sign be "positive". Otherwise, it's a U+002D HYPHEN-MINUS ("-"); let sign be "negative".

      2. Advance position to the next character in input.

      3. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then the string is invalid.

      4. Interpret the sequence collected in the last step as a base ten number, and let that number be timezonehours.

      5. If timezonehours is not a number in the range 0 ≤ timezonehours ≤ 23, then the string is invalid; abort these steps.
      6. If sign is "negative", then negate timezonehours.
      7. If position is beyond the end of input or if the character at position is not a U+003A COLON character, then the string is invalid; abort these steps. Otherwise, move position forwards one character.

      8. Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). If the collected sequence is not exactly two characters long, then the string is invalid.

      9. Interpret the sequence collected in the last step as a base ten number, and let that number be timezoneminutes.

      10. If timezoneminutes is not a number in the range 0 ≤ timezoneminutes ≤ 59, then the string is invalid; abort these steps.
      11. Add the timezone corresponding to an offset of timezonehours hours and timezoneminutes minutes to the results.

      12. Skip to the next step in the overall set of steps.

    5. Otherwise, the string is invalid; abort these steps.

  21. For the "in content" variant: skip Zs characters; for the "in attributes" variant: skip whitespace.

  22. If position is not past the end of input, then the string is invalid.

  23. Abort these steps (the string is parsed).

3.2.5. Time offsets

valid time offset, rules for parsing time offsets, time offset serialisation rules; in the format "5d4h3m2s1ms" or "3m 9.2s" or "00:00:00.00" or similar.

3.2.6. Tokens

A set of space-separated tokens is a set of zero or more words separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.

A string containing a set of space-separated tokens may have leading or trailing space characters.

An unordered set of space-separated tokens is a set of space-separated tokens where none of the words are duplicated.

An ordered set of unique space-separated tokens is a set of space-separated tokens where none of the words are duplicated but where the order of the tokens is meaningful.

When a user agent has to split a string on spaces, it must use the following algorithm:

  1. Let input be the string being parsed.

  2. Let position be a pointer into input, initially pointing at the start of the string.

  3. Let tokens be a list of tokens, initially empty.

  4. Skip whitespace

  5. While position is not past the end of input:

    1. Collect a sequence of characters that are not space characters.

    2. Add the string collected in the previous step to tokens.

    3. Skip whitespace

  6. Return tokens.

When a user agent has to remove a token from a string, it must use the following algorithm:

  1. Let input be the string being modified.

  2. Let token be the token being removed. It will not contain any space characters.

  3. Let output be the output string, initially empty.

  4. Let position be a pointer into input, initially pointing at the start of the string.

  5. If position is beyond the end of input, set the string being modified to output, and abort these steps.

  6. If the character at position is a space character:

    1. Append the character at position to the end of output.

    2. Increment position so it points at the next character in input.

    3. Return to step 5 in the overall set of steps.

  7. Otherwise, the character at position is the first character of a token. Collect a sequence of characters that are not space characters, and let that be s.

  8. If s is exactly equal to token, then:

    1. Skip whitespace (in input).

    2. Remove any space characters currently at the end of output.

    3. If position is not past the end of input, and output is not the empty string, append a single U+0020 SPACE character at the end of output.

  9. Otherwise, append s to the end of output.

  10. Return to step 6 in the overall set of steps.

This causes any occurrences of the token to be removed from the string, and any spaces that were surrounding the token to be collapsed to a single space, except at the start and end of the string, where such spaces are removed.

3.2.7. Keywords and enumerated attributes

Some attributes are defined as taking one of a finite set of keywords. Such attributes are called enumerated attributes. The keywords are each defined to map to a particular state (several keywords might map to the same state, in which case some of the keywords are synonyms of each other; additionally, some of the keywords can be said to be non-conforming, and are only in the specification for historical reasons). In addition, two default states can be given. The first is the invalid value default, the second is the missing value default.

If an enumerated attribute is specified, the attribute's value must be one of the given keywords that are not said to be non-conforming, with no leading or trailing whitespace. The keyword may use any mix of uppercase and lowercase letters.

When the attribute is specified, if its value case-insensitively matches one of the given keywords then that keyword's state is the state that the attribute represents. If the attribute value matches none of the given keywords, but the attribute has an invalid value default, then the attribute represents that state. Otherwise, if the attribute value matches none of the keywords but there is a missing value default state defined, then that is the state represented by the attribute. Otherwise, there is no default, and invalid values must simply be ignored.

When the attribute is not specified, if there is a missing value default state defined, then that is the state represented by the (missing) attribute. Otherwise, the absence of the attribute means that there is no state represented.

The empty string can be one of the keywords in some cases. For example the contenteditable attribute has two states: true, matching the true keyword and the empty string, false, matching false and all other keywords (it's the invalid value default). It could further be thought of as having a third state inherit, which would be the default when the attribute is not specified at all (the missing value default), but for various reasons that isn't the way this specification actually defines it.

3.2.8. References

A valid hashed ID reference to an element of type type is a string consisting of a U+0023 NUMBER SIGN (#) character followed by a string which exactly matches the value of the id attribute of an element in the document with type type.

The rules for parsing a hashed ID reference to an element of type type are as follows:

  1. If the string being parsed does not contain a U+0023 NUMBER SIGN character, or if the first such character in the string is the last character in the string, then return null and abort these steps.

  2. Let s be the string from the character immediately after the first U+0023 NUMBER SIGN character in the string being parsed up to the end of that string.

  3. Return the first element of type type that has an id or name attribute whose value case-insensitively matches s.