Robert Hahn

inspired by integration

I'm always interested in infrastructure that brings people together and facilitates communication. I'm currently exploring social software, markup & scripting languages, and abstract games.

Home | In This Site … | Google Thread
noted on Thu, 15 Jan 2004

A Search Engine for WSMD

I have code here to show you how you could leverage the value inherent in a WSMD file. Unfortunately, since this domain is not my own, I am not free to simply set up the scripts needed as I see fit. So I encourage you to download it, unzip it, and have a look. If you’re not a programmer, you may be more interested in possible use cases for WSMD files if someone should but create scripts to realize them.

The code is provided as proof-of-concept only. I’m sure that many of you reading this are probably far better programmers than I, and I hope that you’ll take what inspiration you can from them and run with it.

For those of you not interested in downloading it (yet?), this post will provide the briefest of overviews of how I wrote this search engine.

What I decided to do was design the search page in such a way as to provide as much context as possible for the search. In this case, what I provided were two HTML select lists. One of them contained all the section titles (so you can set a scope for the search), and the other contained a list of possible file types to search within (choices included HTML, RSS, images, CSS, and Javascript). Finally, you can enter search terms to further refine your search.

This search page was generated by a script called wsmdfindmaker.pl, and its sole purpose was to parse the WSMD and create the list of possible sections and filetypes to search within. The thinking behind doing it this way is that, as sole author of this blog, I will be updating my WSMD file, and hence my search page, only as often as I’m posting. All the other times when requests are made to the page, the representation will be exactly the same. Why bother, then, wasting CPU cycles generating the exact same output between page refreshes?

The search itself is covered by another script, called wsmdfind.cgi, which dynamically constructs the XPath for the search, pulls matching nodes, and attempts to match the search text (if present) to the result set, printing anything that appears to be a candidate result.

I encourage you to download the code and take it out for a spin. The requirements are that you have XML::XPath (and it’s prerequisites) installed. I also make use of the perl CGI module (which seems to be part of the standard Perl distribution these days). I’m running the page successfully on a Mac OS X machine, and despite the tedium of installing all the prerequisites, I have not encountered any actual problems with the install.

This isn’t the only thing you could do with a WSMD. I’ve discussed some other applications in this post that may inspire you.

tall ship