Discuss: Semantics in HTML 5
by John Allsopp
- Editorial Comments
102 Misunderstanding Tag Soup
@Aaron Miller
bq. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.� It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.� The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street.
Tag soup gets used in two different ways which you’re confusing in this comment. 1) tag soup sometimes refers to the serialized source content of a document where tags are potentially misnested, content models invented out of thin air and attribute values requiring quotations not quoted: in general vended content not conforming to any specification anywhere. 2) tag soup parsing refers to a parser that is capable of parsing tag soup (a Herculean task).
When you say that XHTML parsed as text/html will be tag soup you’re confusing these two definitions. The XHTML is certainly not tag soup as it adheres to the XHTML syntax and sometimes even other syntactic requirements on top of that (such as XHTML 1.0 appendix C), so there is no sense in which that content can be considered tag soup. However, if such an XHTML document is vended as text/html it will be parsed by the UA’s tag soup parsing just like any other conforming or non-conforming HTML 2-4.0.1. So in this sense both: not using XHTML; and not vending as applicaiton/xhtml+xml means that the content is parsed by the tag soup parsing processor (just like any other HTML).
It’s important to keep these two meanings of tag soup separate to understand the conversation.
posted at 05:52 am on January 24, 2009 by Rob Burns
103 Same distinction
@Rob, looks like we’re talking about the same distinction. The only difference is that I’m saying the phrase “tag soup” makes it sound anomalous, when in fact from the content side it refers to over 95% of the web, and from the browser (parser) side, it’s SOP. See the Opera MAMA study and Ian Hickson’s Google report if you don’t know what I mean.
posted at 03:07 pm on January 29, 2009 by Aaron Miller
104 Language in Language?
This is not directly related to John’s article, but it reminded me of the following problem I’ve been posing in my head for some time: How is that the syntax of HTML or any other machine-readable grammar is constructed using English? More specifically US English? Has anyone ever tried to construct a language, of even a very light grammar, that allows multi-[Human]languages to describe headers, footers, loops, lists etc? I appreciate that many of these machine-language were first composed in the US and thus US-English has become the Lingua-Franca of programming – but this is the 21st century and not everyone on the Planet who wishes to write code knows how to speak/write English never mind to a specific sub-grammar of it.
posted at 06:33 pm on February 24, 2009 by Russ Michell
105 Another Russian translation
Seems that the first one wasn’t perfect.
Here it is: http://interpretor.ru/html5semantics
posted at 12:46 pm on June 27, 2009 by Montmorency
106 German translation
posted at 05:52 pm on July 13, 2009 by Tobias Otte
107 HTML5 or, the jumpgate for an upgrade
I agree with all the principal points in the article, i have the same opinion about a preferrable use of attributes; but i think they’re breaking compatibility on purpose, we all agree it is stupid to still have concerns about IE6 in 2009 (and soon ’10). If they break the cordon everybody will be happier. And in fact the big push on HTML5 came from browser makers, and looks to me MS wants to be in the game.
In the aftermath we will all have a common base to discuss upon.
Afterall i think some new tags would come in handy.
For eg i think that for something as ubiquitous as a calendar, there should exist a tag, it would end the debate wich solution is more semantic (table vs list, that oddly relies on the kind of visualization we want to give), it would spare a lot of code and give more artistic freedom to designers that could target a parameter/class with a simple javascript to radically modify the visualization.
posted at 01:31 pm on November 9, 2009 by blackdog
108 I think that extensibility is a very core problem,
http://wiki.whatwg.org/wiki/FAQ#HTML5_should_support_a_way_for_anyone_to_invent_new_elements.21
Contains some of their responses to the extensibility problem.
posted at 12:43 am on January 3, 2010 by Tchalvakspam
Got something to say?
Discuss this article. We reserve the right to delete flames, trolls, and wood nymphs.
Create a new account or sign in below if you’d like to leave a comment.
Subscribe to this article's comments: RSS (what’s this?)




101 Tag Soup For the Soul
I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.” It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.” The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street. No one should ever slight reading the specs and knowing the proper way to code either HTML or XML, but when it comes to parsing the Web, see the recent Opera study, with MAMA, or the Google study that Ian Hickson did, and it’s clear that less than 10% of what we deal with out there is XHTML, much less proper XHTML.
posted at 04:16 am on January 24, 2009 by Aaron Miller