Robert Hahn

inspired by integration

I'm always interested in infrastructure that brings people together and facilitates communication. I'm currently exploring social software, markup & scripting languages, and abstract games.

Home | In This Site … | Google Thread
noted on Wed, 30 Jul 2003

Make your website smaller and faster.

People on unix-y type machines, who render their blosxom blogs out as static files, and host on Apache (not sure if this trick works in IIS) might be interested in this trick. Using it, I’ve been able to make my staticly rendered files 25% of the original size, with a corresponding reduction in bandwidth demands. And that means not only do I save on bandwidth costs, but my viewers will get their pages faster too.

For many people, this might seem to be an incredible trick, guaranteed to be a catch. There is one, but it’s probably not as bad as you think. Here’s the lowdown:

One of the features the HTTP 1.1 specification reccomends be implemented in the browsers is the capability of decompressing gzip files, and rendering the results in the window - if it’s a filetype the browser can handle, of course. In fact, between Internet Explorer 4.0 or newer, Netscape 4.0 or newer, and many of the rest of the modern (version 4 compatible) browsers, this feature has been implemented. (OmniWeb 4.1 and Safari 1.0 are two notable exceptions, and I’m sure there are others) This is significant because over 98% of traffic to websites these days come from Netscape & IE browsers of at least version 4.

Even more interesting, if you’re serving files from Apache, it’s configured out of the box in such a way that if you have a file called foo.html.gz (the .gz extension implying that it’s gzipped), and a client requests foo.html, it will send the .gz file automatically. I do not know if IIS can do this as well - anyone with pages that are being served from IIS please experiment and let me know.

So what we have here is a situation where gz files can be served automatically in for calls to .html files, and almost all browsers currently in use can decompress them automatically and display the results in the browser. Seems to me that the time is ripe for some well-earned savings in bandwidth.

I did say there was a catch. It’s up to you to decide how serious it is, and if you’re in control of the server, you can actually work around the problem by installing an Apache module For those browsers who don’t support in-application gzip decompression, your html files will be automatically downloaded to their hard drive. If your traffic logs indicate that 1 person in 1000 will see that problem, then maybe it’s ok for you. If your logs indicate that 1 person in 50 have that problem, maybe it’s not ok. It’s up to you.

What about links? Do they need to be renamed with a .gz extension? No. As I said earlier, if you make a request to foo.html, and there is no foo.html, but there is a foo.html.gz, Apache will give you that instead. No configuration required.

If you’re administering your web server, then you don’t need to do what everyone else might have to do. Just look for the mod_gzip Apache module and install it. That module will detect if the client can support in-application decompression, and automatically compress and send the requested file to them. The module can go one further and builds a cache of compressed files to reduce latency in the long-term.

For the rest of us who still want to take advantage of this trick, here ’tis.

In the terminal, render your blosxom files statically (if you haven’t already done so), cd into the root of your statically rendered files, then type this:

$ find . -name "*.html" -print|xargs gzip

What this command does is find, starting from the current directory, all files that end in .html, print them to the standard output, which gets piped into xargs. xargs will then call gzip for each of the found files, compressing them.

If this is something you like a lot, you could even make a command for it.

If you’re running tcsh, throw this line in your .tcshrc file:

alias webgzip 'find . -name "*.html" -print|xargs gzip'

if you’re running bash, throw this line in your .bash_profile (or equivalent)

alias webgzip='find . -name "*.html" -print|xargs gzip'

For the command to take effect, quit and restart your terminal. If you don’t want to do that, I believe in tcsh you type:

$ source .tcshrc

at a prompt, and for bash:

$ . .bash_profile # or equivalent 

and you’ll get it.

There’s one more thing I’d like to leave you with: If you choose to gzip your files, you might not want to gzip the index page. That way, you can make sure that people have a chance to read at least one html page that will have, I hope, a brief comment about your gzipped site. If you’re really clever with JavaScript, you can write the warning blurb only for those people who might not appreciate what you’re doing.

For more information, you can visit this site:

tall ship