Robert Hahn

inspired by integration

I'm always interested in infrastructure that brings people together and facilitates communication. I'm currently exploring social software, markup & scripting languages, and abstract games.

Home | In This Site … | Google Thread
noted on Thu, 15 Jan 2004

Justification for using XHTML

File sizes

As I mentioned before, I had always assumed that I was going to present the solution in RSS format. I knew that one of the drawbacks to using RSS was that you couldn’t describe the entire website in one file — not if you wanted to preserve the notion of sections to search in. But I figured it didn’t matter. I can simply link to other wsmd.rss files, where each file described a section.

When I actually implemented the format, I discovered that it was a real pain to edit and ensure that everything was set up and linked properly. I also found that the file sizes were absolutely tiny — and while I’m not an expert in the HTTP protocol, I was wondering if perhaps they were so small that the HTTP overhead, combined with I/O bottlenecks for fetching the files, made the scheme a little inefficient.

With that experience, I tweaked the requirements a bit, as you saw here, and came up with a way to describe a site in one file, and it wasn’t that bad in terms of size. For instance, in my WSMD file, I describe 93 assets in a 14K file. Not bad, really. If a large site contained, say, 5000 assets worth putting into a WSMD file, it would occupy a file roughly 750K in size. I suppose that in the web world, that’s huge, but I don’t really see it as being a problem. For one thing, there’s nothing keeping you from breaking the WSMD file into smaller files, with the top-level file linking to other WSMD files. But that might not be necessary if the ones most likely to use the file are the ones hosting the site (and provide you as the user with a richer means of utilizing the site).

Ease of markup

I don’t think anyone could argue intelligently that between RSS, Atom, and XHTML, more people would know XHTML more than the others. This means that the learning curve for implementing WSMD files is nearly flat — you need only learn the model, and familiarize yourself with the two tags absolutely required to make this work.

Viewing a WSMD File.

As you no doubt have already tried to do, trying to view a WSMD file is an interesting experience. Instead of getting markup, you actually get a chance to see the entire website in one page. For those of you on slower connections, that was probably a painful experience, and I’m sorry about that. But I don’t really see it as a liability, because for testing purposes, what better way to check to see if you got all your links working right? Besides, the WSMD file obviously isn’t meant to be viewed in a browser. It’s meant to be mined for useful data.

XHTML offers exactly the right semantics for the job.

Let’s look again at the anatomy of a web site. here’s the breakdown:

Discussion threads, blogs, and wikis can all easily map to either a section (a <div /> tag) containing more granular assets, or simply be noted as an asset (an <object /> tag) which would have as it’s URI the starting point for that service. Browser-level, plug-in, and downloadable assets are definitely objects. Home pages and applications are objects too. The only thing left are sections, and as those are structural hints, not URI-deference-able, they map naturally to <div /> tags.

If a property of a web site maps to a <div /> tag, then the only information we really need is a human-readable title. On the other hand, if it’s an object, then all we really need are the location of the object, the type of object it is, and a description of the contents.

I don’t think I’ve abused the ontology of XHTML to describe a site, whereas with RSS and Atom, I had to shoehorn a required value into at least one of the fields in an unintended manner.

Other Notes on creating WSMD files

As described, none of the possible formats, including this one, force the notion of a home page; it would be up to the author of the file to mark which pages are home pages. If the author of a WSMD file wanted to ensure, however, that the file would be as useful as possible, then he is encouraged to adopt the generally accepted terminology wherever it is relevant.

If it’s not obvious at this point, the value of a WSMD file is directly proportional to what you put in it. You don’t have to put every single resource in it if you don’t want to — at the risk of diminishing the value of WSMD. But there are cases where this is exactly the thing to do. For example, it might not make sense to put any URI’s that come from the middle or end of an application flow, because you want to ensure that people always start at the beginning of a task. You might not want to put some of the graphics in the WSMD, because they describe the look and feel of the site, and would serve no value to anyone else.

Another notion peculiar to the XHTML implementation of WSMD is that while you must describe your site using <div /> and <object /> tags, by no means are you limited to the kind of markup you could put within an object tag. You can make citations, link to other resources, both within or without your site, or otherwise add structure to your data.

All this sounds like great theory, but what’s the utility? Is there a killer app for this? I believe so, and I’ll start the discussion here.

tall ship