Great big bodies

I think the thing that really irritates me about the Long Tail is just how basic the statistical techniques underlying it are. If you’ve got all that data, why on earth wouldn’t you do something more interesting and more informative with it? It’s really not hard. (In fact it’s so easy that I can’t help feeling the Long Tail image must have some other appeal – but more on that later.)

As you may have noticed, this weblog hasn’t been updated for a while. In fact, when I compared it with the rest of my RSS feed I found it was a bit of an outlier:

blogs2

The Y axis is ‘number of blogs’: two updated today (zero days ago), 11 in the previous 10 days, 1 in the 10-day period before that, and so on until you get to the 71-80 column. Note that each column is a range of values, and that the columns are touching; technically this is a histogram rather than a bar chart.

You can do something similar with ‘posts in last 100 days’:

blogs1

This shows that the really heavy posters are in the minority in this sample; twelve out of the eighteen have 30 or fewer posts in the last 100 days.

So it looks as if I’m reading a lot of reasonably regular but fairly light bloggers, and a few frequent fliers. If you put the two series together you can see the two groups reflected in the way the sample smears out along the X and Y axes without much in the middle:

blogs3

My question is this. If you can produce readable and informative charts like this quickly and easily (and I assure you that you can – we’re talking an hour from start to finish, and most of that went on counting the posts), what on earth would make you prefer this:

blogs5

or this:

blogs4

I can only think of two reasons. One is that it looks kind of like a power law distribution, and that’s a cool idea. Except that it isn’t a power law distribution, or any kind of distribution – it’s a list ranked in descending order, and, er, that’s it. The same criticism applies, obviously, to the classic ‘power law’ graphic ranking weblogs in descending order of inbound links.

DIGRESSION
You can compute a distribution of inbound links across weblogs using very much the techniques I’ve used here – so many weblogs with one link, so many with two and so forth. Oddly enough, what you end up with then is a curve which falls sharply then tapers off – there are far fewer weblogs with two links than with only one, but not so much of a difference between the ’20 links’ and ’21 links’ categories. However, even that isn’t a power law distribution, for reasons explained here and here (reasons which, for the non-mathematician, can be summed up as ‘a power law distribution means something specific, and this isn’t it’).
END DIGRESSION

The other reason – and, I suspect, the main reason – is that the Long Tail privileges ranking: the question it suggests isn’t how many of which are doing what? but who’s first?. A histogram might give more information, but it wouldn’t tell me who’s up there in the big head, or how far down the tail I am.

People want to be on top; failing that, they want to fantasise about being on top and identify with whoever’s up there now. Not everyone, but a lot of people. The popularity of the Long Tail image has a lot in common with the popularity of celebrity gossip magazines.

Advertisements

4 Comments

  1. danja
    Posted 8 February 2007 at 09:58 | Permalink | Reply

    “The same criticism applies, obviously, to the classic ‘power law’ graphic ranking weblogs in descending order of inbound links.” – using rank for its own sake may not have much value (it still has some – that’s a concave curve, not a straight line or a step), but number of inbound links is an independent (discrete) variable. Whether or not it follows a power law or not is another question – for that you’d need some kind of correlation measure.

  2. Phil
    Posted 9 February 2007 at 15:45 | Permalink | Reply

    Perhaps I should have included a link to the classic ‘power law’ graphic I had in mind. It’s here.

    As for what you get if you plot number of inbound links on the Y axis, have a look at the pages I linked to in the ‘Digression’ paragraph. But what you emphatically don’t get is a graph with low numbers of inbound links in a ‘long tail’ on the right of the graph. The only way to get that is to use rankings.

  3. Marcin
    Posted 21 March 2007 at 13:39 | Permalink | Reply

    Reminds me of the post below on different models of celebrity (Being a bit more than average being very rare versus people wanting to latch on to the same person for no real reason).

    Also reminds me of a friend who when discussing the possible use of technology with a startup suggested that it be used for making searching of products cheaper and easier, to receive the response, “You’re selling the tail, man – we’re hoping to sell this technology to Gucci!”

    http://stumblingandmumbling.typepad.com/stumbling_and_mumbling/2007/01/beckham_an_adle.html

  4. Posted 27 April 2007 at 13:28 | Permalink | Reply

    I’m a fellow Long Tail derider, but that is a perspective I had not thought of.

    There is a third possible reason why you would want to create those boring graphs, of course, and that’s to create the illusion of ubiquity. Sprinkle some ambiguous phrases and you have the appearance of something deep – a universal law.

    Also, as you say, plotting this graph makes it easy to skip over the fact that not everything that falls off left to right is a power law, and the fact that not all power laws have significant volume in the “tail” of the graph.

    My own take, in far too much detail, is here: http://whimsley.typepad.com/whimsley/2007/03/the_long_tail_l.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: