Robert Hahn

inspired by integration

I'm always interested in infrastructure that brings people together and facilitates communication. I'm currently exploring social software, markup & scripting languages, and abstract games.

Home | In This Site … | Google Thread
noted on Tue, 07 Oct 2003

Crufty Permalinks.

On the Blosxom users mailing list (found at Yahoo), there was a thread started by Dave Walker, who wanted to know if there was a way to make Blosxom URL’s less crufty.

Out of this thread came two links of note. The first was Dave’s inspiration to pose the question. The second is an old essay written by Tim Berners-Lee.

Then a guy who goes by the name Ben wrote this

And in it he said the coolest thing:

“In this light, the HTML is cruft if the person is only trying to access the story and does not care about the format. If, however, the person is specifically trying to access the HTML version of the story, the extension is necessary and worthwhile.”

Given that, I think that if you were to type something like this: http://www.tenletters.com/rhahn/category/entry, then what I ought to have returned is a list of possible representations for that entry to choose from.

Let me illustrate. Suppose you have a Blosxom blog with 4 formats: .txt, .html, .print, and .rss. if I punch in http://www.tenletters.com/rhahn/category/entry, then I should see something like this:

plain text: http://www.tenletters.com/rhahn/category/entry.txt

HTML: http://www.tenletters.com/rhahn/category/entry.html

Printable: http://www.tenletters.com/rhahn/category/entry.print

RSS 0.91: http://www.tenletters.com/rhahn/category/entry.rss

Seems like a good idea, but we’ll need to answer some tough questions.

What format should this representation be in? My suggestion would be to emit unflavored XHTML, and use only tags that also exist in HTML 3.2 and up. That means using these tags: <html>, <head>, <title>, <body>, <p>, <h[1-6]> and <a> — no other. My rationale for this is that by keeping it this simple, if a text browser requests it, they can still see the information, and it’s easy to parse. If a web service requests it, then they can parse the document as being well formed, even if it doesn’t ‘know’ what the semantics of the markup should imply. And, if it’s a web browser making the request, which will happen almost all the time, then it’ll still display properly.

Won’t this mean an extra click to get to the stuff you want? Yes, but here’s a possible workaround: we can use cookies to remember the ‘preferred representation’ and serve that instead of the menu from here on out, and should the menu need to be requested, then either the cookie gets expired, or a .menu would be added to the permalink.

What impact would this have on search engines? Probably none, since you’re returning an HTML document with links the spider can follow.

What about first impressions? What would the behaviour be if a visitor goes to http://www.tenletters.com/rhahn/, assuming that is the ‘home page’ of the blog? In this case, no menu would be visible, because the web server would have been configured to serve the default file type when given either a domain or directory name - typically, an html file.

Won’t that be confusing? I don’t think so. What we’re talking about is the best way to represent a permalink, and as far as I know, I don’t think people normally browse a site through permalinks.

tall ship