My wife and I do not get the paper, ergo we do not get flyers for our local grocery stores. As we’re wanting to shop more price-consciously, the flyers would be nice pieces of information to have.
“Why don’t you check online?” I asked. “Too much bother,” she said, so we now have a routine where, once a week, we pick up flyers from the stores we want to shop at, and plan our trips accordingly.
So I get it into my head to go check these sites myself. I was thinking that maybe I could write a script that would grab the info, package it together in a nice, but privately accessible page, and then my wife would have exactly what she wanted in a nice convenient package. More, once I had that data, I could mix and dice it with our shopping list to help her pick products that were on sale that week.
The two stores we tend to shop at are Zehrs and Food Basics, mostly because they’re the two closest to where we live. I am not linking them here because they do not deserve it. Google “Zehrs markets” and “Food Basics” if you’re interested.
Let’s talk about Food Basics first. What they did was... annoying, but I can understand where they’re coming from. Their online version of the weekly flyer is basically 7 jpgs on 7 pages. Not exactly scrapeable information, but it would be possible to at least bookmark the first page, and the images themselves seem to have predictable URI’s.
Zehrs, now, is another thing altogether. First of all, the site’s in a frameset, which, by the way, isn’t a cardinal sin in my book, if it’s used properly (and it almost always isn’t), and so the URI is masked from view. Selecting their ‘online flyer’ link took me to a city/store selector, which in turn brings up the flyer. Great. Let’s view this frame. Uh-oh. The URI is completely opaque. After scraping the domain name, here’s what it looks like:
Cute, isn’t it? Basically, I can’t bookmark a single URI that would always take me to the first page of their flyer. I can infer that I’m looking at page 1 (the P001 part of the file name) and I can figure out that I’m on week 8 of the year, and I doubt that 2004 would represent anything BUT the year. I could look at it for a few weeks to infer the rest of the pattern, but I’m not done talking about why the Zehrs experience bugs me.
Their flyer, like the Food Basics one, is also a set of images... coincidentally, the image is stored in the same directory structure, with the same name excepting it starts with IMAGE instead of PAGE, and ends in .jpg instead of .asp. I would have been as annoyed at Zehrs as Food Basics, but, combined with the opaque URI, Zehrs looks relatively worse.
But get this: there’s a feature where, if you mouse over certain products on each page, you get a layer containing the flyer text for that item. That’s good, right? That’s scrapeable, right? Well, probably, but not easily. See, I view-sourced the file to see what they got, and instead of finding nice <div />’s with the copy, I instead find something that looks like this:
That’s right, dear readers, they hex-encoded all the characters that would make up their specials. More, they wrote this fairly impressive decoder right in the file. Heaven’s pity, but why? Why bother?
Both these stores had this (in my mind) fantastic means to create brand loyalty by potentially offering data transparently enough that anyone could conceivably shuffle it in with their own personal data (like, in this case, a shopping list). Both these stores could have created an API (like Amazon & Google) for their specials. If the idea took off, they could then reduce the amount they’d need to print for their offline audience.
What can I say? Guess I’ll continue to pick up flyers from the stores. I don’t have that much free time...
Holy cow. I was just last night/this morning conceiving in my mind a web service for generating one time passwords (OTP’s) that anyone can use.
The way it would work is as follows: You need an OTP. You visit this site, which is running under https, and input a URI for what you want the password for. It generates a random number, or an MD5 of the URI combined with a random number, or something like that. Doesn’t matter. You take that generation, go to whatever resource that requires a one-time password (which may be connected to the internet, but not a web-based application), and input it along with the rest of your data.
That second site, upon receiving the password, contacts the OTPG (One Time Password Generator), and sends the random password. That server would only send an http status code (204 No Content, 401 Unauthorized , 403 Forbidden , and 404 Not Found status codes come to mind) as a response, and mark that random number as being ‘used’.
One scenario I can see this being immediately useful for is blogging by email - Once you get an OPT, you can embed it in an email containing a blog, and the script receiving and processing the mail can ping the OTPG server to ensure the mail is valid.
All that isn’t why I said “Holy cow.”
The reason why I said “Holy Cow” was because earlier today (relative to this post date), Sam Ruby started a discussion about what a nonce is - a topic particularly relevant to this application. And I’m glad he did, because he raised some concerns I hadn’t given thought to (yet). Like what happens when you DoS the password server.
One thing I like about my design is what happens if you’re being attacked, but aren’t at the point of collapse. Because I’m requiring a URI as part of the input, the number of possible OTP’s generated per URI remains the same, so as long as it’s giving out random strings, the damage is limited on a per-uri basis, and doesn’t pollute the entire range of values. I had given some thought to expiry, but not a lot. Sam has given me some ideas. I am going to go think on this one some more.
But I think this will work.