RK* - rikkertkoppes.com

thoughts

[SGML] the NET shorttag

Now we're all validating our documents, please tell me why this document (view source) validates and this document (view source) does not validate (hint: you might want to take a look at the parse tree).

And why are people talking that XHTML served as text/html should show ">" charachters all over the place.

The answer is the SGML NET (null end tag) shorttag. In SGML it is legal to close your tags like <em/bla/ instead of <em>bla</em>. So when you use XHTML things like <br /> in SGML (read: XHTML documents served as text/html), the tag should close when the parser encounters the "/" and the ">" remains. Now the ">" is treated as character data and shown. Most browsers do not do this however, so that's a pre for the unaware.

Now you know what it is with my two documents. The a opening tag in the first document closes as soon as the parser encounters the first "/" in that URL, then reads an empty element contents and closes the tag (second "/"). Everything after it is treated as character data, including that bogus attribute. So everything is correct here.

The second document does not validate because of the same reason: the a is already closed when the parser reaches </a>, so an error occurs.

This can be avoided by adding quotes to attributes, in fact, you are only alowed to omit those quotes if the attribute value only contains letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (see W3C: attributes in HTML 4.01 on this).

Note that this is only a validation issue, frequently used browsers do not support the NET shorttag (lynx does). See this example.

Further reading

Additional resources (top 15)

Below is a list of additional resources that might contain extra information about the subject at hand. These are all sites linking to this one (i.e. backtracking).

  1. html vs xhtml (83)
older articles

AdministrationAtom feed