Robert Hahn

inspired by integration

I'm always interested in infrastructure that brings people together and facilitates communication. I'm currently exploring social software, markup & scripting languages, and abstract games.

Home | In This Site … | Google Thread
noted on Thu, 18 Dec 2003

Using Google to Discover Relationships Between People

In this post I explore, as a thought experiment, what the search results mean when you put two (or more) people’s names into Google. For background, consider reading the personLink plug-in announcement, and also consider reading Jon Udell’s article discussing his reflections on using LinkedIn.

This post is an outgrowth of an email conversation between a friend of mine, Andrew Netherton, and I. Andrew isn’t as deep into this web thing as I am, so hearing his thoughts proved to be illuminating, and helped clarify my own.

What does it all mean?

The burning question on my (and I hope your) mind is: what do the search results mean? I don’t have the answers really, but I think I can provide examples and illustrate some of the thinking behind them.

In the most general of classifications, I think what you’ll see in the search results will fall into one or two of these broad categories:

  1. Most links show that the author is writing about what the other person said.
  2. Some (or half) of the links show that the author and the other person is writing to/about each other.
  3. Some of the links show that both the author and the other person share some connection on another page (typically not hosted by them)
  4. Most links show that the other person is linking to or commenting on something the author said.

Each of these things conveys an impression to the reader. Further, the kinds of content found on the other end of those links also contributes to that impression. To use an extreme example, let’s use the generic people Alice and Bob. Bob writes a glowing post about something Alice wrote about, and adds a personLink to Alice. What do you think when you click the personLink and see the following:

  1. All results are links to praises of Alice’s posts by Bob.
  2. All results are links to flames of Alice’s posts by Bob.
  3. Half the results are flames to Bob’s posts by Alice, and the remaining posts are praises to Alice’s posts by Bob.
  4. Half the results are praises to Bob’s posts by Alice, and the remaining posts are praises to Alice’s posts by Bob.

How would this information affect your assessment of the value of Bob’s original post? Bob might be a suckup, or he might be writing a post in satire, or he might be harassing Alice. Or they may have a very good online, public relationship. All of these impressions will contribute to how much value, or credibility to put on a given post, and the authors.

Obviously, to form this picture, you’d need to be motivated enough to read through enough search results to get a feel for what’s going on. Is there an easier way? Possibly. I don’t know. Let me explain:

If you bothered to click on the personLinks I’ve added in this post, you’ll notice that the number of hits where Jon and I are on the same page is, at the time of this writing, about 7 of 14 (7 of the hits not listed are very similar to the 7 that were listed). If you checked out the one with Andrew, you’ll see (also at the time of this writing) 1 hit. If you look up Jon Udell and Tim Bray together you’ll immediately see that they have over 18,000 hits. A closer look (and I don’t know how this could easily be discovered, I clicked on the Gooooogle line a few times ’til I got to the end) revealed that Google eliminated all but 261 results due to similarities. What can we do with this information?

If you want a quick impression, you’ll probably look no further than the first page of results and the total number of hits. The total number can probably imply how developed the relationship is, and the first page probably gives you a good sense of the nature of the relationship.

What else can we say about personLinks? Obviously, the scope of the results are limited only by what’s available for Google to index. That one hit shared between Andrew and I does not in any way illustrate what our friendship is really like. We’ve been friends for years, but our online, public side of that is evidenced only by the one link. Still, if you bother to look into it, you can still learn a thing or two: that I’m hosting a site to which he’s contributing most of the material is a vote of confidence that I have in him. To anything he puts his and my name to, I’ve given him a vote of confidence.

As alarming as that discrepancy may seem, I’m actually not overly concerned by it. Here’s why:

  1. It is very likely that the only relevant public information of interest to a reader is that which is already online.
  2. As the cost of publishing to the net goes down, more people can (and do) post online, generating a richer, denser field of personLink possibilities.

Can a computer figure it out for me?

Another great question that Andrew asked in our conversation is this: How do you get a computer to understand the results? If it’s not clear by now, then I think it’s safe to say you can’t. The reason is that to write a program that successfully describes a relationship between people in a search result (A.I. requirements aside) is that you’re trying to "specify the unspecifiable".

If you wanted to develop possible axioms to rank a conversation on (like volatility, collaborativeness, degree of coincidence), you might be able to solve it using a Bayesian filtering approach, but I’d sure be interested in knowing how you’d train a filtering system to work in a majority of cases.

personLink as a Status Symbol

What does adding personLinks to a post signify about the author to a reader? Perhaps it’s my vanity speaking, but I think that, by creating a personLink, I’m saying that not ashamed of my past online history, and you’re welcome to see what kinds of things I’ve written to/about this other person. I can easily see that if this concept catches on with a lot of bloggers, and many people add it to their sites, then the notion of personLink as status symbol becomes even more valuable. What if someone who normally personLinks didn’t for a particular individual? Does it mean they have something to hide? What does it mean if you don’t personLink at all?

The thing is, no one can really know what the motivations of the author of a post, but if the question piques their curiosity enough, they can always Google the answers themselves. It’s not that hard.

A Notion of Time

Here’s something to think about. In a well developed relationship spanning a considerable amount of time, the potential exists to confuse anyone examining the results of a personLink. Why? Because over time, the relationship is changing. If Alice and Bob have both known of each other for a year, but until 3 months ago only Bob was writing about Alice, and since then, Alice and Bob are writing about each other’s posts, then it might be nice to filter out the old history in favour of the ‘current doings’. If that filtering wasn’t in place, then an impression could be formed that the relationship between Bob and Alice is still largely one-sided, when that’s not true at all.

In those kinds of situations, Google can help create a narrower window. That option is found in Advanced Search. I’m not actually sure if it works - searching for my name, then comparing it with a time-limited subset yielded the same total number of hits, and the exact same output on the page. That doesn’t seem right to me, so if anyone has different observations, please share them.

What Google can’t do (that I can tell) is provide users with a way of bracketing search results from two directions (ie: seeing results 3-6 months old). What we lose there is the ability to examine people’s relationships over different periods of time, and the ability to compare relationships across multiple time frames.

Conclusions, Anyone?

I encourage you to add personLinks wherever you can. I’m planning on whipping up an icon for the purpose soon, which may improve the visibility of this notion. This is the kind of idea that’s too new to really get your head around. People may find useful ways of mining this information I haven’t thought of. Google offers an API of some kind — maybe it offers features not available to a typical user that could add some significant value. If this idea takes off, I could see it totally changing the nature of Friendster/Tribes.net style sites, where, in this new approach, all you need do is input your name and a few areas of interest, and it polls Google for the rest, building a social network entirely out of related personLink queries.

tall ship