thoughts
[HTML / XHTML] common problems
2005-03-10 modified: 2005-06-24
When making the step to XHTML, authors might encounter various difficulties, which are all the result of bad understanding. I will address them in this article
When serving XHTML as text/html
When switching to XHTML, make sure your document is served as application/xhtml+xml, otherwise your document will be treated as HTML.
When you serve your document as XHTML, you instantly run into the first problem: Internet Explorer does not recognize the mime type and tries to download the document. Note that does is not true for Internet Explorer with the Mathplayer plugin (see this comment).
If you do not serve your document as XHTML, it will be treated as HTML, which can cause some other problems. The XML tag minimization is not allowed. However, this does not lead to much trouble for most elements (like <br /> and <hr />), in fact this only causes trouble for elements where the closing tag is mandatory, like <script />.
This is because of the fact that the XML minimization feature in HTML (SGML) is in fact a NET closing tag (which is ignored by most browsers). this causes the element not to close and in case of <script /> leads to a blank document.
Things like document.write, innerHTML and ill-formed documents are allowed, which does not causes any trouble at the moment, but it will when you really switch to XHTML (serving as application/xhtml+xml).
Know that your document object model is a html model, not a xhtml model, which means it is not namespace aware and not case sensitive
When serving (bad) XHTML as application/xhtml+xml
You probably encounter lots of XML parsing errors. These are all due to the fact that certain things are just not allowed in XHTML, things that are commonly used in HTML documents.
In javacsript for instance, document.write and setting innerHTML will raise errors, since these things cannot guarantee a document to be well-formed, something that a XML document has to be. This will change though.
You will get parsing errors in style and script elements if they contain characters like <, > and &. Therefore you have to escape the contents of these elements in a rediculous way:
<script type="text/javascript">
<!--//--><![CDATA[//><!--
/* the script goes here */
//--><!]]>
</script>
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
/* style definitions go here */
/*]]>*/-->
</style>
Ok, this is a little overdone, in modern browsers only a <![CDATA[...]]> block will be sufficient. If you want this to work when sent as text/html, you have to escape the lines (like //<