When is a spike not a spike?

When it’s a long tail. Maybe.

David Weinberger writes:

In a conversation with Erica George at the Berkman she pointed out that the demographics of Live Journal don’t always represent one’s experience of Live Journal — the demographics say that teenage girls are the largest users, but if you’re a 25 year old, your social group there may not look that way at all.

Which raises an issue about the way the “long tail” is pictured. Clay’s charts are accurate depictions of his data, but they have a mythic power that’s misleading: The long tail looks like, well, a long tail when in fact it’s a fractal curlicue of relationships.

This is an interesting point in itself – perhaps the blogosphere would be better viewed as a series (archipelago? galaxy?) of more or less closed, more or less interlinked ‘spheres’. I’m not sure how you’d visualise that, though – perhaps something like the Jefferson High School network diagram?.

But there’s a broader point about the accuracy of those ‘long tail’ graphics. Adam Marsh made an interesting point here about a recently-discovered ‘long tail’:

Clay refers to “the characteristic long tail of people who use many fewer tags than the power taggers.” While this chart does exhibit a “long tail,” this is simply a result of the fact that the users were ordered by decreasing tag usage (also true of the following three charts) — the X axis here doesn’t represent a value, it is just a sequence of users.

The phrase “long tail” usually refers to the observation that for many distributions, the number of elements with outlying values (the “tail”) may be cumulatively significant compared to the number of elements clustered near the average.

On inspection, it turns out that this is also true of the celebrated ‘Power law and Weblogs’ graphic: there are no values on the X axis, just a list of blogs arranged in descending order of number of links. This matters, because in a graphical representation of a statistical distribution both axes carry information. Typically, values of the variable being measured run low to high on the X axis, left to right, while the count of occurrences of each value runs high to low on the Y axis, top to bottom. Clay wrote, “We are all so used to bell curve distributions that power law distributions can seem odd.” But Clay’s own graphics aren’t so much odd as misleading, and not only because he’s put high values on the left of the graph rather than the right. In effect, he’s got two axes conveying one piece of information. Andrew Sullivan’s blog and Instapundit get a high Y value (lots of links) and a high X value (because all the sites with lots of links have been sorted to the left).

If you took the same numbers and plotted them on an X axis with values – if you produced a graph showing how many blogs had how many links, with zero at the origin on both scales… Well, I don’t know what would happen – but five minutes’ experimentation tellsreminds me that, if you wanted to produce a nice clear series of vertical bars rather than a line that wanders all over the place, you’d need to put ‘number of blogs’ on the Y axis and ‘number of inbound links’ on the X axis, rather than vice versa. (There’s a simple reason for this: some values are unique by definition, others aren’t.) Which in turn means that any vertical spike would represent large numbers of blogs (say, for example, blogs with small numbers of inbound links) while any long tail would represent small numbers (say, for example, the few blogs with lots of links).

Caveat: I haven’t crunched any actual numbers, or even mumbled them gently. But maybe we’ve been looking at this the wrong way round, statistically speaking. Perhaps the long tail is the spike; perhaps the spike is really the long tail.

Advertisements

2 Comments

  1. Clare
    Posted 31 May 2005 at 11:16 | Permalink | Reply

    I never know whether the blogs I read represent a relatively closed social circle, or whether I’m just looking at a quite random subset of the whole blogosphere.

    A lot of the blofs I read link to each other, but I’m also aware that every single one of them links to people I’ve never heard of. I try not to follow those links; I don’t have time to read the ones I DO look at, never mind new ones.

    But what occasionally bugs me is the possibility that there is some other blogging social circle out there, better suited to me. For instance, most of the bloggers I read not only don’t admit to taking drugs, they actively DON’T take them. And so far I’m the only overtly bisexual blogger I’ve come across.

    But mainly I think I should just ignore the whole thing and focus on more important stuff! Ha. Fat chance.

  2. Adam
    Posted 14 June 2005 at 01:52 | Permalink | Reply

    Hi Phil,

    I thought more about what was bothering me (and apparently you) about the ranked graphs…latest thoughts are here if you’re interested.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: