Unwebbable

by Joe ClarkJuly 21, 2009

Published in HTML

It’s time we came to grips with the fact that not every “document” can be a “web page.” Some forms of writing just cannot be expressed in HTML—or they need to be bent and distorted to do so. But for once, XML might actually help.

Article Continues Below

The creation myth of the web tells us that Tim Berners-Lee invented HTML as a means of publishing physics research papers. True? It doesn’t matter; it’s a founding legend of the web whose legacy continues to this day. You can gin up as many web applications as you want, but the web is mostly still a place to publish documents.

The web is replete with projects to “digitize legacy content”—patent applications, books, photographs, everything. While photographs might survive well as JPEGs or TIFFs (disregarding accessibility issues for a moment), the bulk of this legacy content requires semantic markup for computers to understand it. A sheet of paper provides complete authorial freedom, but that freedom can translate poorly to the coarse semantics of HTML. The digitization craze—that’s what it is—crashes headlong into HTML semantics.

Some documents cannot be published using HTML. In many cases, we shouldn’t even bother trying. In other cases, we have to radically change the appearance and structure of the document. Ideally, we’ll start using custom XML document types—which, finally and at long last, might actually work.

The screenplay problem#section2

An example of the conundrum of transferring print documents to the web, one that has become legendary in some circles, is the film screenplay.

A lot of people want to write a screenplay. The outcomes for most of these writers are the same: Nobody films and releases their movie. And they all go through the same phase—learning the generations-old “style” of screenplay formatting.

Typewritten screenplay from Die Hard 2.

Originating in the typewriter age, screenplay layouts are custom-engineered so that one printed page (in what we now call U.S. letter size) equals almost exactly one minute of onscreen time. Since most commercial movies run about two hours in length, typical Hollywood movie scripts are 118 to 122 pages long.

Typography is lousy; old typewriter fonts of yesteryear were errantly mapped onto today’s spindly Courier type. But as an example of document engineering, scripts are brilliant.

There’s an entire science involved in text indention. Text is rarely, if ever, “centered”; everything lines up at a tab stop, a concept that CSS expunges from the collective memory. (You could set left margins using the ch unit in CSS3, but nobody does.)
With careful alignments like these, it’s easy to scan down a screenplay page. Semantic use of ALL CAPITALS aids scanning, and clearly does not live up to the purely mechanical name CSS gives it, “text-transform.”

And now people want to transfer the format—intact—to the web. It’s not going to work.

Web “pages” may be called that, but the term is metaphorical. It has nothing to do with sheets of paper that equate to screen time. (Right away that means a shooting script’s many headers and footers would disappear, since we’re dealing with only one “page.”)
Nobody seriously intends screenplays on the web to have the same function they do in real life—getting read, getting optioned or bought, and getting shot. All of that happens on paper, not on Firefox.
HTML (per se) is not extensible. Extensible HTML (XHTML) has really not been extended. Hence the following truism is not going to change: HTML does not have enough tags for the semantics of screenplays, where nearly everything needs its own tag.
- Dialogue seems to be no problem, but dialogue is intermingled with screen and actor instructions, and in HTML both of those would just be placed in paragraph elements—even though the function, and expected appearance, differ drastically.
- What about the myriad headings, including the names of people speaking and notations for the time of day and the manner of speech (often called slugs or sluglines)? We have “a lot” of heading tags in HTML—six of them—but they are arranged hierarchically, not according to function. Would class names really suffice here—that is, H2 class=“slugline” versus H2 class=“charactername”? Really, the answer is no. Script headings and HTML headings are two different things.
The real movie industry doesn’t need HTML in the first place; it already has viable electronic exchange formats for scripts.
1. One is the proprietary format of Final Draft, the software that dominates the screenplay market the way MS Word dominates in offices. Open-source fanatics may look at this as one more delicious chance to inveigh against a proprietary format, but screenwriters have better things to worry about than open source. Anyway, Final Draft 8’s default document format is now XML.
2. The other is PDF. The movie business doesn’t have to care about accessibility, so even PDFs to which no accessibility features have been added suffice for script exchange. You don’t need tagged PDF, which also doesn’t have enough semantics for screenplays. (You could, in theory, write your own PDF tags, since they’re just XML.)

The quest to adapt scripts to the web recalls other “category errors,” to use Martin Amis’s phrase. Electronic commerce, we eventually figured out, does not take the form of “shopping malls” you “walk” through. “Magazines” and “catalogues” do not have discrete pages you flip (complete with sound effects) and dog-ear. “Web sites” do not look like magazine layouts, complete with multicolumn text and callouts.

Tellingly, this quest recalls early television, which, conventional wisdom holds, behaved more like filmed stageplays. Bringing scripts to the web is noticeably worse than filming a stageplay.

Now, people have tried to make web pages look exactly like typewritten screenplays. The star of this show is screenwriter and inveterate blogger John August. Scrippets, August’s plug-in for WordPress, Blogger, and other systems, does everything it can to spin straw into gold. Among other things, one of August’s use cases is perfect “screenplay” formatting when viewed in an RSS reader, and the only way to make that happen is through presentational HTML and inline styles. These are, of course, outmoded development methods.

August pitches his project thus (emphasis added): “With Scrippets, you can add boxes of nicely-formatted script to your blog.” That’s actually a restatement of the problem—failed reliance on a page metaphor, failed efforts to duplicate typewriter typography, and failed attempts to replicate one-page-per-minute layout. Script formatting is “nice” for print, but it’s wrong for the web—even for “little boxes” of script content.

Worse, Scrippets ignores whatever small contribution HTML semantics can offer in marking up a screenplay. Pretty much everything gets marked up as paragraphs, but not everything is a paragraph. This is a worse sin than loading up H2s with class names in an uphill battle to notate screenplay semantics.

The screenplay solution#section3

The way to adapt scripts for the web is through cosmetic surgery. And we have a precedent for it. There’s a healthy market for screenplays published in book form. In fact, “the shooting script” is an actual U.S. trademark (from Newmarket Press) for one series of book versions of movie screenplays.

Some books just reprint typewritten screenplays at reduced size. This may make you feel like a pro, but what you should feel is cheated: You’re paying good money to read an author’s typewritten manuscript. Spindly Courier looks even worse in reduced size.
Other books completely redesign typewritten screenplays into a design native to book publishing. In a typical layout, speaker names are run inline with dialogue, normal book margins are used, and there’s a huge compaction of vertical whitespace. Typewritten screenplays read quite well in their intended context—but so do screenplay books in their context. (Retypeset scripts have also been used as language-learning aids.)

Hence to adapt this existing printed form to the web, you have to abandon all hope of duplicating original typescript formatting. You have to design something native to the web, with its relatively weak semantics and pageless or single-page architecture.

You could use HTML definition lists to mark up dialogue—explicitly permitted in (W3C-brand) HTML, explicitly banned by Ian Hickson under HTML5. (There, use DIALOG instead, even though the descendants of that tag, DT and DD, are the same descendants DL has.)
You can use PRE to fake indention and line breaks (but you can’t fake the division of a script into pages).
You can disregard text indention and just use CENTERed text.
You could, without too much of a stretch, mark up a script as a table.
You could just not bother too much with semantics, run character names (in bold or STRONG) inline with dialogue, and use HTML headings where feasible.

Other print formats that need transformation#section4

Mastheads: The list of who does what at a magazine or newspaper is actually semantically complex, because each person’s title or the department they work in seems to be a heading. But a masthead marked up with H1 through H6 essentially pollutes the tag stream of the surrounding web page.
Callouts and sidebars: These structures, familiar from magazines, newspapers, and nonfiction books, cause serious confusion in creating a functioning document tree. (At what exact point in the tag stream are you expected to read the callout or sidebar?)
Footnotes: There isn’t a structure for footnotes in HTML (though there is in tagged PDF). Developers have tried all sorts of hacks, including JavaScript show/hide widgets and various rats’ nests of links and reverse links. For literature fans, HTML’s lack of footnotes makes the work of the late David Foster Wallace functionally impossible to render on the web (especially his footnotes within footnotes).
Charticles: With origins commonly attributed to Spy, a charticle is an illustrated featurette with a lot more accompanying text than what a bare illustration has. By way of comparison, a Flickr photo festooned with notes is functionally identical to a charticle, but HTML has no semantics for it.
Math and science: Yes, that old chestnut. Before you exclaim “MathML!” the way a pensioner might yell out “Bingo!,” understand that barely anybody uses MathML on real web pages due to serious authoring difficulty—physicist Jacques Distler remains among the very few who do.

How do we solve the problem?#section5

Armed with this knowledge, what are we going to do? Prediction: nothing. People will continue to fake the appearance of scripts and use John August”“caliber presentational code. But we do have an alternative.

The case typified by screenplays is merely a new variation of the difficulty of encoding literature in XML. People have tried it time and time again over the years, but barely any DTD has gotten traction. People just want to mark up everything in HTML (which has staying power). Ill-trained authors mark up everything as a paragraph or a DIV.

People seem to have taken the catchphrase “HTML is the lingua franca of the web” a bit too literally. HTML derives from SGML; XHTML is XML in a new pair of shoes. That’s four kinds of markup right there, but everybody acts as though there is only one kind, HTML. (Most of the time, browsers act like XTHML is HTML with trailing slashes.) Even electronic books are marked up as HTML, as the ePub file format is essentially XHTML 1.1 inside a container file—but that makes ePub files simultaneously HTML and XML. If we can spit those out, why can’t we spit out other kinds of XML?

We are well past the stage where browsers could not be expected to display valid, well-formed XML. Browsers can now do exactly that. Variant literary document types could actually work now. But because they languished on the vine for so long, now it seems nobody wants to make them work. After all, isn’t our new future wrapped up in HTML5? Just as our old future was wrapped up in XHTML2?

Conclusion#section6

The web is, of course, a wondrous thing, but its underlying language lacks the vocabulary to express even the things that humans have already expressed elsewhere. We ought to accept that some documents have to be reformatted for the web, at least if the goal is using plain HTML. To give web documents the rich semantics of print documents, XML is finally a viable option.

42 Reader Comments

matthewbuchanan says:

July 21, 2009 at 10:40 am

Joe — minor correction for accuracy, uppercase uses _text-transform_, not _text-decoration_.
zeptimius says:

July 21, 2009 at 11:14 am

Another good example of ‘unwebbable’ text is poetry, especially the modern kind that plays with page margins, text orientation (another inherent HTML weakness), font faces and font sizes, and even page size sometimes. In fact, this is text that _deliberately_ defies conventions and structure.
Alexis Deveria says:

July 21, 2009 at 12:06 pm

Great article, raises some good questions.

Personally I would use a definition list if I were to mark up a masthead. Would that not be most appropriate? Also, I believe the HTML5

element is intended to cover callouts, sidebars and footnotes (to some degree).
DouglasT says:

July 21, 2009 at 12:22 pm

I love to see people coming up with new ways of solving problems. I’d be curious to see how the HTML 5 Aside would work, but it’s the xml idea that got my attention. Seems like it would be the most inherently flexible method to me.
mattrossidesigns says:

July 21, 2009 at 1:13 pm

So, I suppose I am in the boat with the people that think anything can be “webbable”. I think that what it comes down to is, how accustomed and comfortable people have become with a particular format/routine.

I do agree that XML would be a viable solution to consider here. Although, I had wished you discussed more of a solution with XML rather than just the term itself.

Bottom line. A message needs to be conveyed from the writers, to the actors, and whom ever else. How you get there is the question. Another question would be why go to the web with the scripts?
Tim-Wright says:

July 21, 2009 at 1:24 pm

I didn’t notice if anyone mentioned it, but HTML5 actually has a new <dialog> element
AfroNinja says:

July 21, 2009 at 1:25 pm

In reading this article, I was reminded of Plotbot, which makes (at least) a small attempt to break the paradigm of poor formatting by providing an XML download of one’s screenplay. The problem, of course, is that it’s not in a format that’s usable to any other application.

However, I see this as a good first step.
erikvorhes says:

July 21, 2009 at 1:50 pm

For digitizing manuscripts, etc., I’ve found the DTDs from the “Text Encoding Initiative”:http://www.tei-c.org/ to be really useful. Joe, et al., the TEI might be worth checking out, if you haven’t already.
Michael Newton says:

July 21, 2009 at 2:48 pm

bq. Before you exclaim “MathML!” the way a pensioner might yell out “Bingo!,”
I don’t think I’ve ever laughed out loud at an ALA article. Say what they will about Mr. Clark, nobody can deny he’s got a way with words.

“@AfroNinja”:http://www.alistapart.com/comments/unwebbable//#7 if the document is XML then if has to be usable by other applications, doesn’t it?
Michael Newton says:

July 21, 2009 at 2:50 pm

bq. Before you exclaim “MathML!” the way a pensioner might yell out “Bingo!,”

I don’t think I’ve ever laughed out loud at an ALA article. Say what they will about Mr. Clark, nobody can deny he’s got a way with words.

“@AfroNinja:”:http://www.alistapart.com/comments/unwebbable//#7 if the document is XML, it has to be usable by other applications, doesn’t it?
jetweedy says:

July 21, 2009 at 3:41 pm

I thought the article was interesting in that it shows how custom XML comes in handy over simple XHTML as a means of storage… but then again, was (X)HTML even intended for general information storage? XML marks up information, and (X)HTML just happens to mark up information specifically about web page content.

The semantic web is, for a great part, about automation, and therefore storing a screenplay as XML, reading that in using a script (say, PHP) and then passing it to the browser as XHTML seems to me to not only be very possible, but also very sensible… so I don’t understand what about a screenplay is ‘unwebbable’, any more than, say, a blog with comments. It would be just as bad to store a, but blogs are quite obviously alive and well on the web. You store it in one format (database) and display it in another (X/HTML).
jetweedy says:

July 21, 2009 at 3:44 pm

Oops! I didn’t finish my sentence! Meant to say…

“It would be just as bad to store a weblog as a static HTML page.”

Sorry. 🙂
iMasque says:

July 21, 2009 at 3:48 pm

I think the main problem is that you need a bit of software to actually _understand_ a given XML document, rather than just validate it.

DTDs (or XSDs) make it easy for a program, say a web browser, to check that a document is valid but they don’t describe what the content actually _means_ or how it should be displayed. This makes accessibility difficult; you can’t apply any semantic meaning to the content and it will just be presented as a mess of text.

The only reason that browsers can understand (X)HTML, for example, is that their developers have read the extensive (human readable) documentation made available by the W3C. Therefore, they know that the H1 element is a heading and can offer tools (e.g. a document outline) and default styling that makes use of that knowledge.

Getting a single piece of software that can do that with any type of XML document is never going to be feasible.
F1LT3R says:

July 21, 2009 at 4:41 pm

“Bringing scripts to the web is noticeably worse than filming a stageplay.” – oh so damn spot on!

“You could, without too much of a stretch, mark up a script as a table.” – Yep, that’s exactly what I thought.

“But a masthead marked up with H1 through H6 essentially pollutes the tag stream of the surrounding web page.” – Even with HTML5 Â«footerÂ» Â«asideÂ» etc tags? Perhaps not.

“Armed with this knowledge, what are we going to do? Prediction: nothing.” – Agreed, and nothing is exactly what should be done too.

“After all, isn’t our new future wrapped up in HTML5? Just as our old future was wrapped up in XHTML2? ” – NO!!!! HTML5 is wrapped up in fun, XML is wrapped up in the invisible extendability of usefulness. Fun is easier to deliver on than usefulness. When you add “fun” to “ding” you have another reason HTML5 will grow quicker XML… faster ROI.

“The web is, of course, a wondrous thing, but its underlying language lacks the vocabulary to express even the things that humans have already expressed elsewhere.” – Good! The idea that HTML could express everything humans have expressed elsewhere is insane. The law of “least resistance” dictates that a super-language with a rigid syntax would never accommodate a race who’s very expression is ‘to break the mold’. How many commands can a mind hold? So much easier to navigate a small set of fluid meta controls than a galaxy full of rocks.
Matt Eppelsheimer says:

July 21, 2009 at 7:48 pm

bq. To give web documents the rich semantics of print documents, XML is finally a viable option.

I use xhtml because the experts who have guided my growth as a web designer seem to believe in it. I believe it *think* it has potential to do all sorts of amazing things”¦ but I don’t know how to use it for anything beyond typical semantic web page markup that validates. I have no concept of its potential. This concluding sentence seems to indicate I’m not far off the mark. So my understanding is that if I become proficient in xml, I could in theory create elegant ways to extend my xhtml markup to convey things like screenplays.

The truth is I find this all very confusing, and I’m not even sure what question to ask. Is my understanding correct? I think I need more “how” to better understand the “why.” Where to next? Where do I go for an xml 101? (Please don’t say W3C!).

Thank you.
Greg Reimer says:

July 21, 2009 at 9:14 pm

“The creation myth of the web tells us that Tim Berners-Lee invented HTML as a means of publishing physics research papers. True? It doesn’t matter; it’s a founding legend of the web whose legacy continues to this day.”

Okay I’ll bite. Someone care to set the record straight?

“We are well past the stage where browsers could not be expected to display valid, well-formed XML.”

XML (and related technologies) is hard for beginners. I think every HTML author who first hears about XML is like “Cool! HTML where you can invent your own tags!” So they dive in and about the time they encounter namespaces something dies inside them.

It’s not so much that HTML is the lingua franca of the web, but that tag-soup is the lingua franca of the web.
simonrjones says:

July 22, 2009 at 5:26 am

interesting and thorough article Joe. People, not unreasonably,
want to use common and widely understood conventions from print. As you rightly point out, this can rarely be done without adaptation to the media.

Are you suggesting one solution screenplays is to use something like the Final Draft XML format and get browser manufacturers to create default styles to parse it? Or are we talking XLST here?..

Of your other suggestions, the humble PRE tag seems the best/most pragmatic (though has no semantic meaning behind it).
valerauko says:

July 22, 2009 at 5:55 am

You say that XML is a solution — but the web is not only about having the data there, i would want to display it, that’s why i put it on my public server and not on my bookshelf. I don’t see what’s the problem with the MathML-kind of stuff. Those are sulutions for this kind of problems. Somebody took the trouble to make a DTD for a pretty troublesome field. I wonder if there’s any XML markup for say music sheets…
But the main question: how am i supposed to format my own xml? No way. Am i wrong?
ihatemornings says:

July 22, 2009 at 8:06 am

Marking up a film script is complicated, and when it gets complicated nobody does it.

@valerauko wonders ‘if there’s any XML markup for say music sheets’. Check out “XML and Music”:http://xml.coverpages.org/xmlMusic.html for one of many abandoned resources dedicated to the online storage, transfer and display of musical notation. Clever and passionate people work on wonderful DTDs for their thesis. Then they leave uni and never update (or use) the markup they created.

It’s too complicated. Like Joe said about screenplays, it’s easier to use PDFs or “proprietary formats”:http://www.sibelius.com/products/scorch/index.html for display and “super-simple markup”:http://chordie.com for everyday use.
davidramos says:

July 22, 2009 at 12:39 pm

Not all meaning can be captured in in hierarchical trees; not all meaning can be rendered machine-readable. Sometimes the essential part of a designed object is space and the way that small pieces form a whole, and I don’t think computers can understand that yet.
Richard Fink says:

July 22, 2009 at 5:35 pm

Back when I was a tot, before anyone even dreamed of things like icons, drop-down menus, and windows, “computer literacy” was a big concern. Meaning there was a fear that the current crop of young people would fall behind in their communication skills because having them meant knowing how to program a computer. In hindsight, expecting people to climb a learning curve like that to publish a simple document is a laugh.
Right now, I’m typing into a little box and after I click “Submit” the whole world will be able to read it. Big learning curve involved with that one, eh?
Are we all going to learn XML now? Or is the idea to build it into products that ordinary people can use?
My wife writes a lot of papers in the APA style. So do others – millions of them are generated every year. Ever see what MS Word spits out when you save one of them as HTML? Smart stuff in (hopefully), garbage out.
What we need right now are well thought out, ad hoc, interim solutions. Using class names may be hacky but at a later date, at least the document can be machine-parsed and transformed into something better.
Since I do a lot of thinking about stuff like this, that’s just my off-the-top take. Terrific article. Did a post on “Readable Web”:http://readableweb.com/moving-from-print-to-screen-a-case-study-from-joe-clark/ recommending it. Ciao.
vai says:

July 22, 2009 at 8:24 pm

… for courses. The web is not a paginated format. Text-indentation is not the way to indicate separations in content. A script’s semantic structure is something you infer from its appearance, a document’s is implicitly defined.

Your use case is something I’m dubious of – though if the intent is to completely reproduce the appearance of a script in the classic style, whilst creating / editing in a digital medium a document that retains semantic structure, you’d be better off using XML and looking at something like XSL-FO for presentation.
Richard Fink says:

July 22, 2009 at 9:43 pm

@vai who wrote:
“Text-indentation is not the way to indicate separations in content.”
If you don’t count the typographic convention of the last 500 years or so that uses indentation to separate textual content, you’re absolutely right.
Or am I misunderstanding you?
vai says:

July 22, 2009 at 10:15 pm

‘in a document’ should be implicit in that sentence, and by which I mean ‘from the document’s point of view’.

Case in point – being typewritten, the script is bound to the medium of paper. A document can end up on multiple media, in various presentations. How do you represent text-indent in a vocal presentation? You don’t – it has no ‘meaning’ beyond what you infer about the content from it. Model your domain (db / xml schema / what-have-you), populate, take care of presentation issues afterward. This applies to software as a whole, and even the humble document.

Yes, you’re misunderstanding me.
Joe Clark says:

July 24, 2009 at 5:20 pm

Vai, text formatting like screenplay indention would not translate to “vocal” format because one experiences spoken texts in sequence (one word at a time, let’s say), whereas the sighted reader can scan an entire page in seconds. You see alignments and indents out of the corner of your eye and they guide your reading of the page. The same isn’t true when, to use an example, the page is enunciated by a screen reader. Then it’s just one bit of content after another.
Diane Vigil says:

July 25, 2009 at 12:55 pm

You’re right that some texts don’t particularly lend themselves well to the flow of web formats. For now, it appears that we must use various hacks (or old-school tables), neither of which are a particularly good way to go (but could at least be visually readable).

I’ve used Final Draft, which is not the most user-friendly program (it feels stiff and clunky). That said, I’d be hard put to believe that studios or other owners of movie scripts actually want them out there on the Web.

I’d be at least as interested in the ability to use more than the few fonts that Macs, PCs and Linux machines share in common.
Divya Manian says:

July 27, 2009 at 4:16 am

With reference to Joe’s latest comment, it seems to me that the idea of all the indentation seems to make the screenplays easier to read. If that is the case, I don’t see why XML or any other language is going to make any difference.

On the other hand, MathML or SVG makes a lot of sense in why XML is a good language to use to markup. But this usage for screenplay seems to be thin ice.
Joe Clark says:

July 27, 2009 at 5:42 pm

Divya, we could fake the appearance of screenplay pages, except of course for the fact that we don’t have pages and the entire purpose of a printed screenplay is at odds with whatever purpose, if any, an online screenplay has.

But at root HTML doesn’t give us rich enough semantics to mark up the actual content. XML might.
Gonzo says:

July 27, 2009 at 10:13 pm

I would argue that footnotes are not webable. Aren’t hyperlinks the World Wide Web’s alternative to footnotes? Instead of providing bibliography after the document and marking the references to the titles in it with [1], [2], [3] we use hyperlinks to the original articles. Instead of explaining a word in a footnote, we just create a hyperlink to Wikipedia or other source of knowledge somewhere in the web.

I can imagine situations where the traditional footnotes can’t be replaced with traditional hyperlinks, but a little creativity would always come handy. After all web is not print, is it? So web documents are not meant to be the same as print pages, are they?
patrick_l says:

July 28, 2009 at 2:00 pm

I have to admit that until now I used paragraphs or divs styled with CSS scripts to format difficult parts of web sites. Having read this article I realize that it should be much easier to create structures like footnotes or scientific formula. Nowadays it appears contradictory that we can buy things and watch TV with a browser, but we cannot display simple equations like x=1/2 in a well-looking way (and in my eyes png graphics cannot be a solution for this issue). But finally I am optimistic—even though HTML 5 solves not all problems, it shows that the web languages are undergoing progress.
Stephen Down says:

July 29, 2009 at 10:47 am

bq. I would argue that footnotes are not webable.

I wouldn’t, although I find it easier to use endnotes than footnotes (ie, when you print them, they all appear at the end of the document rather than on the relevant page).

In the text, use an 1 element for the footnote reference. You can number it manually, or if you want to be clever (and not worry about legacy browsers), you could probably get it to display the number with a counter. This will give the contents of the footnote as a tooltip on :hover. Then add a

list at the end of the document defining each number with the appropriate footnote as the definition.

Other options would be to display the content of the footnotes in a frame/iframe at the bottom of the page or as a lightbox if you want it to look fancy.
Stephen Down says:

July 29, 2009 at 11:00 am
What’s so hard about semantically marking up a film script? No, you won’t get the page break every 60 seconds, but I’d be surprised if that works out at a hard-and-fast rule anyway, and maybe that’s the one thing that has to give.

Reverse angle – over their shoulders

Slowly, without any fuss, and without a pattern of sorts, that would be pretty if the impact wasn’t so frightening… slowly, all the runway lights are going out.

McClane

Jesus…

Int. Virginia Church – same time

As Stuart’s tech throws some more switches –

and so on.

Use a counter on the
element (I’ve assumed that a couple of higher levels will be needed but use whatever level is appropriate). Or use an
Stephen Down says:

July 29, 2009 at 11:03 am

OK, so the Textile preview bears no relation to what is actually output…

I used normal angle brackets around *h3*, which it is left alone, *strong*, which it has converted to square brackets, and *p*, which it has stripped altogether.

The screenplay should have read:
[p class=”direction”]Slowly, without any fuss, and without a pattern of sorts, that would be pretty if the impact wasn’t so frightening”¦ slowly, [strong]all the runway lights are going out[/strong].[/p]
[p class=”speaker”]McClane[/p]
[p class=”speech”]Jesus”¦[/p]
RobShaver says:

July 31, 2009 at 4:33 pm

In the author bio it has the phrase, “His ongoing missions”. This smacks of bringing religion to the ignorant natives to me. Here’s a piece of dogma about using “presentational HTML and inline styles” delivered without benefit of any technical justification, “These are, of course, outmoded development methods.” Just what is your objection to John August’s desire to have his RSS feeds look the way his readers, screenwriters, expect it to look? And what is it that makes them outmoded if the politically correct methods don’t work in RSS?

So I guess I missed the sermon on why everything on the Internet should be marked up systematicly. What I’ve never seen explained, in detail, is what the benefit is to this and why it is promoted with such zeal. Give me some examples of where this has been done and what benefit was derived.

The first step in justifying any technology is to clearly explain the goal of that technology. So let’s talk about the screenplay on the Internet. First, who is it there for and what are they doing with it? In the case of John August’s blog, they are there to be read by the people that read his blog and, I think, no other reason. They are there for education … to illustrate the point he’s making in his post.

Now I know that a script that is going to be produced does serve some additional purposes in the pre-production phase of film making. It is called the script breakdown. This is usually a manual step where the various film making departments comb through the script to find all the characters, props, locations, sounds and visuals that will be needed during production. There are script software systems which allow the author (or someone else) to “markup” the script to identify these elements and automate the breakdown. These are WYSIWYG tools and not HTML like at all.

So let’s talk about marking up a script in HTML:
“Would class names really suffice here—that is, H2 class=”slugline” versus H2 class=”charactername”? Really, the answer is no. Script headings and HTML headings are two different things.” They are? Who says? Why? Again, no technical justification … just dogma. You might as well say, “Well, everyone knows that’s just wrong.”

Each of these IS a heading that introduces and pertains to what follows until another heading is reached. That, to me, is the essence of what a heading is. Do class names not add to the systematic metadata about that entry? Here’s my shot at marking it up:

… stuff left out here …

THE CAB

32

CONTINUE –

But Barns doesn’t reply … just tries – and fails – to point out the window. Everybody turns.

REVERSE ANGLE – OVER THEIR SHOULDERS

Slowly without any fuss, and with a pattern of sorts that would pretty if the impact weren’t so frightening … slowly ALL THE RUNWAY LIGHTS GO OUT.

MCCLANE

Jesus …

… stuff left out

(I hope this comment preview is accurate.)
I see no reason this markup, which is systematic to me, can’t be styled using CSS and I know that I could write a parser to automate the breakdown of a script formatted this way.

So I guess I’ve missed your whole point, being one of the ignorant natives.

“but what you should feel is cheated” and don’t tell me how I should feel either. I don’t feel cheated when I buy those screenplay books. If you feel cheated then own that.

After writing this I guess my objection is to the tone of the article. I guess you may not like the tone of my reply either. Well the Internet is a big place.

Peace,

Rob:-]
Dave Baldwin says:

August 1, 2009 at 9:21 pm

I agree with most of what RobShaver said. While I think I understand Joe Clark’s point, I agree that functions like movie script writing and formatting need a “Movie Script Editor” program. If there is adequate demand, it would probably provide a web formatted output, maybe even like Joe suggests.

As far as “outmoded development methods”… there are so many responses to that statement that I can’t even begin to enumerate them here. Try using modern methods with HTML email. There is also the implication that people who still use “outmoded development methods” are Wrong. I know of people who built their sites with Front Page 4 and are still maintaining them with it. Would I recommend Front Page for anything? Hell, no… but I refuse to say these people are Wrong for doing something that works for them. Fortunately, the web browser manufacturers believe they need to support the existing web including those “outdated methods”.

Speaking of browsers, I think the browser writers are going to trump all of the supposed standards and keep trying to make software that works. HTML5? Yeh, we’ll support that… and HTML 1, 2, 3.2, 4, XHTML, javascript, ECMAscript, DOM, whatever it takes to make it work. ‘deprecated’? Not in this browser!

If W3C can’t make things that Improve the situation, maybe Adobe will. The web page of the future:

<html>
<head>
<title>Untitled</title>
</head>
<body>

</body>
</html>
Richard Fink says:

August 2, 2009 at 12:08 am

Thanks to Rob Shaver for going to the trouble of creating a demo for what I had in mind when I said:
“Using class names may be hacky but at a later date, at least the document can be machine-parsed and transformed into something better.”
There might be a smart way to get rel, rev, and title attributes in on the act, too.
BTW – is this not the basic technique used in Microformats?
If Rob’s example were to be codified – spec’d to exactly when and where these combinations of tags and classnames are to be applied, it certainly would provide a “schema” of sorts.
It’s one way to approach the problem, surely.

Surely we could accomplish any of this with a simple

 tag? Thus giving the author pretty much the same freedom of the typewriter. A little clever javascript to deal with interface differences like tabbing (or any other typerwriter oddities) shouldn’t be too hard either. The argument for semantics is voided by the counter-argument that the printed version is automatically semantic by reader interpretation – the same would apply to pre-formatted text… My pencil doesn’t need to know whether it’s writing a paragraph or title, and nor would an online screenplay. It’s fair to say that typical web semantics don’t apply, and that the screenplay format could *very* easily be replicated.

Also the notion that an HTML document is automatically “Pageless” is daft, if you want to represent “1 minute” intervals what’s wrong with denoting every 50th line or so?

PS – apologies for my rudeness, I think that your technical argument is overshadowed by your obvious sentimentality for the typewritten form. If you’d presented your article with more romanticism and less “fact” I would have enjoyed and appreciated the article as I imagine you meant!

David King says:

August 2, 2009 at 2:47 pm

That should have read < PRE >, I would’ve given an example if it wasn’t for the terror that is Textile formatting!!
Joe Clark says:

August 12, 2009 at 11:05 pm

Rob Shaver, for “missions” read “goals.” At least I have some.
jax says:

August 16, 2009 at 8:22 am

bq. “Michael Newton”:http://www.alistapart.com/comments/unwebbable//#9 if the document is XML, it has to be usable by other applications, doesn’t it?

No it will not. Any XML application is opaque to any other XML application unless you have prior knowledge. The claim that an XML processor somehow magically knows what an XML application is about is a myth based on “mismarketing”:http://my.opera.com/jax/blog/html5-xml-stealth XML, and probably part of the reason for the perceived backlash against XML. XML solves problems, just not the problems it often is claimed to solve.

Back to the article, there is no reason why the internal format shouldn’t be a task-specific XML format (or any other format for that matter), and XML has the advantage that it can be transformed into HTML fairly easily. However I don’t think the particular example of film scripts was that well-chosen, as they can be encoded in HTML with no loss of information. The example in the comments with musical annotations might be a better one, as there is no adequate support for that in HTML.
Roguebfl says:

December 24, 2009 at 6:14 pm

“Another question would be why go to the web with the scripts?”

@mattrossidesigns (Post #5)

To share them with community theatre and school drama groups, is my first thought.
louBurnard says:

April 16, 2010 at 12:21 pm

I’m surprised to see no reference to the Text Encoding Initiative (http://www.tei-c.org), which has been saying more or less the same thing as this article since the mid 90s. And which is now more or less the de facto xml vocabulary of choice for marking up the meaning structure of texts rather than their accidental appearance.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

Unwebbable

The screenplay problem#section2

The screenplay solution#section3

Other print formats that need transformation#section4

How do we solve the problem?#section5

Conclusion#section6

Like this:#section7

42 Reader Comments

Reverse angle – over their shoulders

Int. Virginia Church – same time

THE CAB

REVERSE ANGLE – OVER THEIR SHOULDERS

Got something to say?

More from ALA

To Ignite a Personalization Practice, Run this Prepersonalization Workshop

The Wax and the Wane of the Web

Opportunities for AI in Accessibility

I am a creative.

Humility: An Essential Value