Category Archives: ethnoclassification

An eerie sight

David introduces a new feature at Librarything:

“tagmashes,” which are (in essence) searches on two or more tags. So, you could ask to see all the books tagged “france” and “wwii.” But the fact that you’re asking for that particular conjunction of tags indicates that those tags go together, at least in your mind and at least at this moment. Library turns that tagmash into a page with a persistent URL.

I like everything about this, apart from the horrible name. As somebody points out in comments, it’s not a new idea – a large part of David’s post could have been summed up in the words “Librarything have implemented faceted tagging“. But I think this is still something worth shouting about, for two reasons. Firstly, they have implemented it – it’s there now to be played with, even if it’s got a silly name. Secondly and more importantly, they’ve implemented ground-up faceted tagging: the facets are created by the act of searching for particular combinations of tags. At a stroke this addresses the disadvantages I identified in my post; rather than being imposed beforehand, the dimensions into which the tags are organised emerge from the ways people want to combine tags. Arguably, what Librarything have ended up with is something like a cross between faceted tagging and Flickr-style tag clusters (in which dimensions emerge from an aggregate of past searches).

What’s more, the ability to record an association between two tags addresses a question I raised way back here. If, to quote Tom Evslin, “we think in terms of associations” (rather than conceptual hierarchies); and if “the relationship between documents is actually dynamic … open tagging and hyperlinking are both ways to impose particular relationships on documents to meet the need of some subset of readers”; then it’s curious, to say the least, that it’s been so hard until now to use tagging to say this is like that (as distinct from this has frequently been applied to resources which have also been classified as that). From del.icio.us on, tagging has been a simple naming operation, hitching up things to names (stuff-for-classifying to tags), but not allowing any connection between those names. The implication is that the higher-order knowledge of what went with what would only emerge – could only emerge – from the aggregate of everyone else’s naming acts.

The ‘tagmash’ reminds us that (pace David) everything is not miscellaneous: yes, we think in associations and we apply our own labels and classifying schemes to the world, but as we do so we’re also connecting A to B and treating D as a sub-type of C. When we talk, we don’t just spray names around; we’re always adding a bit of structure to the conversational cloud, making a few more connections. It’s the connections, not the nodes, that map out the shape of a cloud of knowing.

Update Changed post title. I spent a good five minutes this afternoon thinking of an appropriate lyrical reference (eventually settling on this one); I don’t know how I missed the obvious choice.

The vagaries of science

The slightly oxymoronic Britannica Blog has recently hosted a series of posts on Web 2.0, together with responses from Clay Shirky, Andrew Keen and others. The debate’s been of very variable quality, on both the pro- and the anti- side; reading through it is a frustrating experience, not least because there’s some interesting stuff in among the strawman target practice (on both sides) and the tent-preaching (very much on both sides). As I said in response to a (related) David Weinberger post recently, it’s not always clear whether the pro-Web 2.0 camp are talking about how things are (what knowledge is like & how it works) or about how things are changing – or about how they’d like things to change. The result is that developments with the potential to be hugely valuable (like, say, Wikipedia) are written about as if they had already realised their potential, and attempts to point out flaws or collateral damage are dismissed as naysaying. On the anti- side, the danger is of an equally unthinking embrace of how things are – or how they were before all this damn change started happening.

All this is by way of background to some comments I left on danah boyd‘s contribution (which is well worth reading in full), and may explain (if not excuse) the impatient tone. danah, then me:

Why are we telling our students not to use Wikipedia rather than educating them about how Wikipedia works?

Because I could give a 20-credit course on ‘how Wikipedia works’ and not get to the bottom of it. It’s complex. It’s interesting. I happen to believe it’s an almighty mess, but it’s a very complex and interesting mess. For practical purposes “Don’t cite it” is quicker.

Wikipedia is not perfect. But why do purported experts spend so much time arguing against it rather than helping make it a better resource?

This is a false opposition: two different activities with different timescales, different skillsets and different rewards. I get an idea, I write it down – generally it won’t let me go until I’ve written it down. I look at what I’ve written down, and I want to rewrite it – quite often it won’t let me go until I’ve rewritten it. All of this takes slabs of time, but they’re slabs of time spent engrossed with ideas and language, my own and other people’s – and the result is a real and substantial contribution to a conversation, by an identifiable speaker.

I look at a bad Wikipedia article [link added] and I don’t know where to start. What I’d like to do is delete the whole thing and put in the stub of a decent article that I can come back to later, but I sense that this will be regarded as uncool. What I don’t want to do is clamber through the existing structure of an entry I think shouldn’t have been written in the first place correcting an error here or there, because that’s a long-drawn-out task that’s both tedious and unrewarding. And what I particularly don’t want to do is return to the article again and again over a period of weeks because my edits are getting reverted by someone hiding behind a pseudonym.

(I think what Wikipedia anonymity has shown, incidentally, is that people really don’t like anonymity. Wikipedia has produced its own stable identities – and its own authorities, based on the reputation particular Wikipedia editors have established within the Wikipedia community.)

Is it really worth that much prestige to write an encyclopedia article instead of writing a Wikipedia entry?

Well, yes. If I get a journal article accepted or I’m commissioned to write an encyclopedia article, I’m joining an established conversation among fellow experts. What I’ve written stays written and gets cited – in other words, it contributes to the conversation, and hence to the formation of the cloud of knowledge within the discipline. And it goes on my c.v. – because it can be retrieved as part of a reviewable body of work. If I write for Wikipedia I don’t know who I’m talking to, nobody else knows who’s writing, and what I’ve written can be unwritten at any moment. And it would look ridiculous on my c.v. – because they’ve only got my word that it is part of my body of work, assuming it still exists in the form in which I wrote it.

The way things are now, knowledge lives in domain-sized academic conversations, which are maintained by gatekeepers and authorities. Traditional encyclopedias make an effort to track those conversations, at least in their most recently crystallised (serialised?) form. Wikipedia is its own conversation with its own authorities and its own gatekeepers. For the latest state of the Wikipedia conversation to coincide with the conversation within an established domain of knowledge is a lucky fluke, not a working assumption.

Update The other big difference between traditional encyclopedias and Wikipedia (as someone known only as ‘bright’ reminded me, in comments over here) is that the latter gets much more use. From my response:

Comparisons with the Britannica are interesting as far as they go – and I don’t believe they do Wikipedia any favours – but they don’t address the way that Wikipedia is used, essentially as an extension of Google. When I google for information I’m not hoping to find an encyclopedia article. Generally, Britannica articles used to appear on the first page of hits, but not right at the top; usually you’d see a fan sites, hobby sites, school sites, scholarly articles and domain-specific reference works on the same page, and usually the fan sites, etc, would be just as good. (I stopped using the Britannica altogether as soon as it went paywalled.) If all that had happened was that Britannica results had been pushed down from number 8 to number 9, with their place being taken by Wikipedia, I doubt we’d be having this conversation. What’s happened is that, for topic after topic, Wikipedia is number 1; the people who would have run all those fan sites and hobby sites are either writing for Wikipedia instead or they’re not bothering, since after all Wikipedia is already there. (Or else the sites are still out there, but they’re way down the search result list because they’re not getting the traffic.) It’s a monoculture; it’s a single point of failure, in a way that encyclopedias aren’t. And it’s the last thing that should have happened on the Web. (I’ll own up to a lingering Net idealism. Internet 0.1, I think it was.)

So much that hides

Alex points to this piece by Rashmi Sinha on ‘Findability with tags’: the vexed question of using tags to find the material that you’ve tagged, rather than as an elaborate way of building a mind-map.

I should stress, parenthetically, that that last bit wasn’t meant as a putdown – it actually describes my own use of Simpy. I regularly tag pages, but almost never use tags to actually retrieve them. Sometimes – quite rarely – I do pull up all the pages I’ve tagged with a generic “write something about this” tag. Apart from that, I only ever ask Simpy two questions: one is “what was that page I tagged the other day?” (for which, obviously, meaningful tags aren’t required); the other is “what does my tag cloud look like?”.

Now, you could say that the answer to the second question isn’t strictly speaking information; it’s certainly not information I use, unless you count the time I spend grooming the cloud by splitting, merging and deleting stray tags. I like tag clouds and don’t agree with Jeffrey Zeldman’s anathema, but I do agree with Alex that they’re not the last word in retrieving information from tags. Which is where Rashmi’s article comes in.

Rashmi identifies three ways of layering additional information on top of the basic item/tag pairing, all of which hinge on partitioning the tag universe in different ways. This is most obvious in the case of faceted tagging: here, the field of information is partitioned before any tags are applied. Rashmi cites the familiar example of wine, where a ‘region’ tag would carry a different kind of information from ‘grape variety’, ‘price’ or for that matter ‘taste’. Similar distinctions can be made in other areas: a news story tagged ‘New Labour’, ‘racism’ and ‘to blog about’ is implicitly carrying information in the domains ‘subject (political philosophy)’, ‘subject (social issue)’ and ‘action to take’.

There are two related problems here. A unique tag, in this model, can only exist within one dimension: if I want separate tags for New Labour (the people) and New Labour (the philosophy), I’ll either have to make an artificial distinction between the two (New_Labour vs New_Labour_philosophy) or add a dimension layer to my tags (political_party.New_Labour vs political_philosophy.New_Labour). Both solutions are pretty horrible. More broadly, you can’t invoke a taxonomist’s standby like the wine example without setting folksonomic backs up, and with some reason: part of the appeal of tagging is precisely that you start with a blank sheet and let the domains of knowledge emerge as they may.

Clustered tagging (a new one on me) addresses both of these problems, as well as answering the much-evaded question of how those domains are supposed to emerge. A tag cluster – as seen on Flickr – consists of a group of tags which consistently appear together, suggesting an implicit ‘domain’. Crucially, a single tag can occur in multiple clusters. The clusters for the Flickr ‘election’ tag, for example, are easy to interpret:

vote, politics, kerry, bush, voting, ballot, poster, cameraphone, democrat, president

wahl, germany, deutschland, berlin, cdu, spd, bundestagswahl

canada, ndp, liberal, toronto, jacklayton, federalelection

and, rather anticlimactically,

england, uk

Clustering, I’d argue, represents a pretty good stab at building emergent domains. The downside is that it only becomes possible when there are huge numbers of tagging operations.

The third enhancement to tagging Rashmi describes is the use of tags as pivots:

When everything (tag, username, number of people who have bookmarked an item) is a link, you can use any of those links to look around you. You can change direction at any moment.

Lurking behind this, I think, is Thomas‘s original tripartite definition of ‘folksonomy’:

the three needed data points in a folksonomy tool [are]: 1) the person tagging; 2) the object being tagged as its own entity; and 3) the tag being used on that object. Flattening the three layers in a tool in any way makes that tool far less valuable for finding information. But keeping the three data elements you can use two of the elements to find a third element, which has value. If you know the object (in del.icio.us it is the web page being tagged) and the tag you can find other individuals who use the same tag on that object, which may lead (if a little more investigation) to somebody who has the same interest and vocabulary as you do. That person can become a filter for items on which they use that tag.

This, I think, is pivoting in action: from the object and its tags, to the person tagging and the tags they use, to the person using particular tags and the objects they tag. (There’s a more concrete description here.)

Alex suggests that using tags as pivots could also be considered a subset of faceted browsing. I’d go further, and suggest that facets, clusters and pivots are all subsets of a larger set of solutions, which we can call domain-based tagging. If you use facets, the domains are imposed: this approach is a good fit to relatively closed domains of knowledge and finite groups of taggers. If you’ve got an epistemological blank sheet and a limitless supply of taggers, you can allow the domains to emerge: this is where clusters come into their own. And if what you’re primarily interested in is people – and, specifically, who‘s saying what about what – then you don’t want multiple content-based domains but only the information which derives directly from human activity: the objects and their taggers. Or rather, you want the objects and the taggers, plus the ability to pivot into a kind of multi-dimensional space: instead of tags existing within domains, each tag is a domain in its own right, and what you can find within each tag-domain is the objects and their taggers.

What all of this suggests is that, unsurprisingly, there is no ‘one size fits all’ solution. I suggested some time ago that

If ‘cloudiness’ is a universal condition, del.icio.us and Flickr and tag clouds and so forth don’t enable us to do anything new; what they are giving us is a live demonstration of how the social mind works.

All knowledge is cloudy; all knowledge is constructed through conversation; conversation is a way of dealing with cloudiness and building usable clouds; social software lets us see knowledge clouds form in real time. I think that’s fine as far as it goes; what it doesn’t say is that, as well as having conversations about different things, we’re having different kinds of conversations and dealing with the cloud of knowing in different ways. Ontology is not, necessarily, overrated; neither is folksonomy.

I couldn’t make it any simpler

I hate to say this – I’ve always loathed VR boosters and been highly sceptical about the people they boost – but Jaron Lanier’s a bright bloke. His essay Digital Maoism doesn’t quite live up to the title, but it’s well worth reading (thanks, Thomas).

I don’t think he quite gets to the heart of the current ‘wisdom of the crowds’ myth, though. It’s not Maoism so much as Revivalism: there’s a tight feedback loop between membership of the collective, collective activity and (crucially) celebration of the activity of the collective. Or: celebration of process rather than end-result – because the process incarnates the collective.

Put it this way. Say that (for example) the Wikipedia page on the Red Brigades is wildly wrong or wildly inadequate (which is just as bad); say that the tag cloud for an authoritative Red Brigades resource is dominated by misleading tags (‘kgb’, ‘ussr’, ‘mitrokhin’…). Would a wikipedian or a ‘folksonomy’ advocate see this situation as a major problem? Not being either I can’t give an authoritative answer, but I strongly suspect the answer would be No: it’s all part of the process, it’s all part of the collective self-expression of wikipedians and the growth of the folksonomy, and if the subject experts don’t like it they should just get their feet wet and start tagging and editing themselves. And if, in practice, the experts don’t join in – perhaps, in the case of Wikipedia, because they don’t have the stomach for the kind of ‘editing’ process which saw Jaron Lanier’s own corrections get reverted? Again, I don’t know for sure, but I suspect the answer would be another shrug: the wiki’s open to all – and tagspace couldn’t be more open – so who’s to blame, if you can’t make your voice heard, but you? There’s nothing inherently wrong with the process, except that you’re not helping to improve it. There’s nothing inherently wrong with the collective, except that you haven’t joined it yet.

Two quotes to clarify (hopefully) the connection between collective and process. Michael Wexler:

our understanding of things changes and so do the terms we use to describe them. How do I solve that in this open system? Do I have to go back and change all my tags? What about other people’s tags? Do I have to keep in mind all the variations on tags that reflect people’s different understanding of the topics?The social connected model implies that the connections are the important part, so that all you need is one tag, one key, to flow from place to place and discover all you need to know. But the only people who appear to have time to do that are folks like Clay Shirky. The rest of us need to have information sorted and organized since we actually have better things to do than re-digest it.

What tagging does is attempt to recreate the flow of discovery. That’s fine… but what taxonomy does is recreate the structure of knowledge that you’ve already discovered. Sometimes, I like flowing around and stumbling on things. And sometimes, that’s a real pita. More often than not, the tag approach involves lots of stumbling around and sidetracks.

It’s like Family Feud [a.k.a. Family Fortunes - PJE]. You have to think not of what you might say to a question, you have to guess what the survey of US citizens might say in answer to a question. And that’s really a distraction if you are trying to just answer the damn question.

And our man Lanier:

there’s a demonstrative ritual often presented to incoming students at business schools. In one version of the ritual, a large jar of jellybeans is placed in the front of a classroom. Each student guesses how many beans there are. While the guesses vary widely, the average is usually accurate to an uncanny degree.This is an example of the special kind of intelligence offered by a collective. It is that peculiar trait that has been celebrated as the “Wisdom of Crowds,”

The phenomenon is real, and immensely useful. But it is not infinitely useful. The collective can be stupid, too. Witness tulip crazes and stock bubbles. Hysteria over fictitious satanic cult child abductions. Y2K mania. The reason the collective can be valuable is precisely that its peaks of intelligence and stupidity are not the same as the ones usually displayed by individuals. Both kinds of intelligence are essential.

What makes a market work, for instance, is the marriage of collective and individual intelligence. A marketplace can’t exist only on the basis of having prices determined by competition. It also needs entrepreneurs to come up with the products that are competing in the first place. In other words, clever individuals, the heroes of the marketplace, ask the questions which are answered by collective behavior. They put the jellybeans in the jar.

To illustrate this, once more (just the once) with the Italian terrorists. There are tens of thousands of people, at a conservative estimate, who have read enough about the Red Brigades to write that Wikipedia entry: there are a lot of ill-informed or partially-informed or tendentious books about terrorism out there, and some of them sell by the bucketload. There are probably only a few hundred people who have read Gian Carlo Caselli and Donatella della Porta’s long article “The History of the Red Brigades: Organizational structures and Strategies of Action (1970-82)” – and I doubt there are twenty who know the source materials as well as the authors do. (I’m one of the first group, obviously, but certainly not the second.) Once the work’s been done anyone can discover it, but discovery isn’t knowledge: the knowledge is in the words on the pages, and ultimately in the individuals who wrote them. They put the jellybeans in the jar.

This is why (an academic writes) the academy matters, and why academic elitism is – or at least can be – both valid and useful. Jaron:

The balancing of influence between people and collectives is the heart of the design of democracies, scientific communities, and many other long-standing projects. There’s a lot of experience out there to work with. A few of these old ideas provide interesting new ways to approach the question of how to best use the hive mind.Scientific communities … achieve quality through a cooperative process that includes checks and balances, and ultimately rests on a foundation of goodwill and “blind” elitism — blind in the sense that ideally anyone can gain entry, but only on the basis of a meritocracy. The tenure system and many other aspects of the academy are designed to support the idea that individual scholars matter, not just the process or the collective.

I’d go further, if anything. Academic conversations may present the appearance of a collective, but it’s a collective where individual contributions are preserved and celebrated (“Building on Smith’s celebrated critique of Jones, I would suggest that Smith’s own analysis is vulnerable to the criticisms advanced by Evans in another context…”). That is, academic discourse looks like a conversation – which wikis certainly can do, although Wikipedia emphatically doesn’t.

The problem isn’t the technology, in other words: both wikis and tagging could be ways of making conversation visible, which inevitably means visualising debate and disagreement. The problem is the drive to efface any possibility of conflict, effectively repressing the appearance of debate in the interest of presenting an evolving consensus. (Or, I could say, the problem is the tendency of people to bow and pray to the neon god they’ve made, but that would be a bit over the top – and besides, Simon and Garfunkel quotes are far too obvious.)

Update 13th June

I wrote (above): It’s not Maoism so much as Revivalism: there’s a tight feedback loop between membership of the collective, collective activity and (crucially) celebration of the activity of the collective. Or: celebration of process rather than end-result – because the process incarnates the collective.

Here’s Cory Doctorow, responding to Lanier:

Wikipedia isn’t great because it’s like the Britannica. The Britannica is great at being authoritative, edited, expensive, and monolithic. Wikipedia is great at being free, brawling, universal, and instantaneous.If you suffice yourself with the actual Wikipedia entries, they can be a little papery, sure. But that’s like reading a mailing-list by examining nothing but the headers. Wikipedia entries are nothing but the emergent effect of all the angry thrashing going on below the surface. No, if you want to really navigate the truth via Wikipedia, you have to dig into those “history” and “discuss” pages hanging off of every entry. That’s where the real action is, the tidily organized palimpsest of the flamewar that lurks beneath any definition of “truth.” The Britannica tells you what dead white men agreed upon, Wikipedia tells you what live Internet users are fighting over.

The Britannica truth is an illusion, anyway. There’s more than one approach to any issue, and being able to see multiple versions of them, organized with argument and counter-argument, will do a better job of equipping you to figure out which truth suits you best.

Quoting myself again, There’s nothing inherently wrong with the process, except that you’re not helping to improve it. There’s nothing inherently wrong with the collective, except that you haven’t joined it yet.

When there is no outside

Nick Carr’s hyperbolically-titled The Death of Wikipedia has received a couple of endorsements and some fairly vigorous disagreement, unsurprisingly. I think it’s as much a question of tone as anything else. When Nick reads the line

certain pages with a history of vandalism and other problems may be semi-protected on a pre-emptive, continuous basis.

it clearly sets alarm bells ringing for him, as indeed it does for me (“Ideals always expire in clotted, bureaucratic prose”, Nick comments). Several of his commenters, on the other hand, sincerely fail to see what the big deal might be: it’s only a handful of pages, it’s only semi-protection, it’s not that onerous, it’s part of the continuing development of Wikipedia editing policies, Wikipedia never claimed to be a totally open wiki, there’s no such thing as a totally open wiki anyway…

I think the reactions are as instructive as the original post. No, what Nick’s pointing to isn’t really a qualitative change, let alone the death of anything. But yes, it’s a genuine problem, and a genuine embarrassment to anyone who takes the Wikipedian rhetoric seriously. Wikipedia (“the free encyclopedia that anyone can edit”) routinely gets hailed for its openness and its authority, only not both at the same time – indeed, maximising one can always be used to justify limits on the other. As here. But there’s another level to this discussion, which is to do with Wikipedia’s resolution of the openness/authority balancing-act. What happens in practice is that the contributions of active Wikipedians take precedence over both random vandals and passing experts. In effect, both openness and authority are vested in the group.

In some areas this works well enough, but in others it’s a huge problem. I use Wikipedia myself, and occasionally drop in an edit if I see something that’s crying out for correction. Sometimes, though, I see a Wikipedia article that’s just wrong from top to bottom – or rather, an article where verifiable facts and sustainable assertions alternate with errors and misconceptions, or are set in an overall argument which is based on bad assumptions. In short, sometimes I see a Wikipedia article which doesn’t need the odd correction, it needs to be pulled and rewritten. I’m not alone in having this experience: here’s Tom Coates on ‘penis envy’ and Thomas Vander Wal (!) on ‘folksonomy’, as well as me on ‘anomie’.

It’s not just a problem with philosophical concepts, either – I had a similar reaction more recently to the Wikipedia page on the Red Brigades. On the basis of the reading I did for my doctorate, I could rewrite that page from start to finish, leaving in place only a few proper names and one or two of the dates. But writing this kind of thing is hard and time-consuming work – and I’ve got quite enough of that to do already. So it doesn’t get done.

I don’t think this is an insurmountable problem. A while ago I floated a cunning plan for fixing pages like this, using PledgeBank to mobilise external reserves of peer-pressure; it might work, and if only somebody else would actually get it rolling I might even sign up. But I do think it’s a problem, and one that’s inherent to the Wikipedia model.

To reiterate, both openness and authority are vested in the group. Openness: sure, Wikipedia is as open to me as any other registered editor d00d, but in practice the openness of Wikipedia is graduated according to the amount of time you can afford to spend on it. As for authority, I’m not one, but (like Debord) I have read several good books – better books, to be blunt, than those relied on by the author[s] of the current Red Brigades article. But what would that matter unless I was prepared to defend what I wrote against bulk edits by people who disagreed – such as, for example, the author[s] of the current article? On the other hand, if I was prepared to stick it out through the edit wars, what would it matter whether I knew my stuff or not? This isn’t just random bleating. When I first saw that Red Brigades article I couldn’t resist one edit, deleting the completely spurious assertion that the group Prima Linea was a Red Brigades offshoot. When I looked at the page again the next day, my edit had been reverted.

Ultimately Wikipedia isn’t about either openness or authority: it’s about the collective activity of editing Wikipedia and being a Wikipedian. From that, all else follows.

Update 2/6/06 (in response to David, in comments)

There are two obvious problems with the Wikipedia page on the Brigate Rosse, and one that’s larger but more diffuse. The first problem is that it’s written in the present tense; it’s extremely dubious that there’s any continuity between the historic Brigate Rosse and the gang who shot Biagi, let alone that they’re simply, unproblematically the same group. This alone calls for a major rewrite. Secondly, the article is written very much from a police/security-service/conspiracist stance, with a focus on question like whether the BR was assisted by the Czech security services or penetrated by NATO. But this tends to reinforce an image of the BR as a weird alien force which popped up out of nowhere, rather than an extreme but consistent expression of broader social movements (all of which has been documented).

The broader problem – which relates to both of the specific points – goes back to a problem with the amateur-encyclopedia format itself: Wikipedia implicitly asks what a given topic is, which prompts contributors to think of their topic as having a core, essential meaning (I wrote about this last year). The same problem can arise in a ‘proper’ encyclopedia, but there it’s generally mitigated by expertise: somebody who’s spent several years studying the broad Italian armed struggle scene is going to be motivated to relate the BR back to that scene, rather than presenting it as an utterly separate thing. The motivation will be still greater if the expert on the BR has also been asked to contribute articles on Prima Linea, the NAP, etc. This, again, is something that happens (and works, for all concerned) in the kind of restricted conversations that characterise academia, but isn’t incentivised by the Wikipedia conversation – because the Wikipedia conversation doesn’t go anywhere else. Doing Wikipedia is all about doing Wikipedia.

Cloudbuilding (3)

By way of background to this post – and because I think it’s quite interesting in itself – here’s a short paper I gave last year at this conference (great company, shame about the catering). It was co-written with my colleagues Judith Aldridge and Karen Clarke. I don’t stand by everything in it – as I’ve got deeper into the project I’ve moved further away from Clay’s scepticism and closer towards people like Carole Goble and Keith Cole – but I think it still sets out an argument worth having.

Mind the gap: Metadata in e-social science

1. Towards the final turtle

It’s said that Bertrand Russell once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the centre of our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: “What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise.”

Russell smiled and replied, “What is the tortoise standing on?”

“You’re very clever, young man, very clever,” said the old lady. “But it’s turtles all the way down.”

The Russell story is emblematic of the logical fallacy of infinite regress: proposing an explanation which is just as much in need of explanation as the original fact being explained. The solution, for philosophers (and astronomers), is to find a foundation on which the entire argument can be built: a body of known facts, or a set of acceptable assumptions, from which the argument can follow.

But what if infinite regress is a problem for people who want to build systems as well as arguments? What if we find we’re dealing with a tower of turtles, not when we’re working backwards to a foundation, but when we’re working forwards to a solution?

WSDL [Web Services Description Language] lets a provider describe a service in XML [Extensible Markup Language]. [...] to get a particular provider’s WSDL document, you must know where to find them. Enter another layer in the stack, Universal Description, Discovery, and Integration (UDDI), which is meant to aggregate WSDL documents. But UDDI does nothing more than register existing capabilities [...] there is no guarantee that an entity looking for a Web Service will be able to specify its needs clearly enough that its inquiry will match the descriptions in the UDDI database. Even the UDDI layer does not ensure that the two parties are in sync. Shared context has to come from somewhere, it can’t simply be defined into existence. [...] This attempt to define the problem at successively higher layers is doomed to fail because it’s turtles all the way up: there will always be another layer above whatever can be described, a layer which contains the ambiguity of two-party communication that can never be entirely defined away. No matter how carefully a language is described, the range of askable questions and offerable answers make it impossible to create an ontology that’s at once rich enough to express even a large subset of possible interests while also being restricted enough to ensure interoperability between any two arbitrary parties.
(Clay Shirky)

Clay Shirky is a longstanding critic of the Semantic Web project, an initiative which aims to extend Web technology to encompass machine-readable semantic content. The ultimate goal is the codification of meaning, to the point where understanding can be automated. In commercial terms, this suggests software agents capable of conducting a transaction with all the flexibility of a human being. In terms of research, it offers the prospect of a search engine which understands the searches it is asked to run and is capable of pulling in further relevant material unprompted.

This type of development is fundamental to e-social science: a set of initiatives aiming to enable social scientists to access large and widely-distributed databases using ‘grid computing’ techniques.

A Computational Grid performs the illusion of a single virtual computer, created and maintained dynamically in the absence of predetermined service agreements or centralised control. A Data Grid performs the illusion of a single virtual database. Hence, a Knowledge Grid should perform the illusion of a single virtual knowledge base to better enable computers and people to work in cooperation.
(Keith Cole et al)

Is Shirky’s final turtle a valid critique of the visions of the Semantic Web and the Knowledge Grid? Alternatively, is the final turtle really a Babel fish — an instantaneous universal translator — and hence (excuse the mixed metaphors) a straw person: is Shirky setting the bar impossibly high, posing goals which no ‘semantic’ project could ever achieve? To answer these questions, it’s worth reviewing the promise of automated semantic processing, and setting this in the broader context of programming and rule-governed behaviour.

2. Words and rules

We can identify five levels of rule-governed behaviour. In rule-driven behaviour, firstly, ‘everything that is not compulsory is forbidden’: the only actions which can be taken are those dictated by a rule. In practice, this means that instructions must be framed in precise and non-contradictory terms, with thresholds and limits explicitly laid down to cover all situations which can be anticipated. This is the type of behaviour represented by conventional task-oriented computer programming.

A higher level of autonomy is given by rule-bound behaviour: rules must be followed, but there is some latitude in how they are applied. A set of discrete and potentially contradictory rules is applied to whatever situation is encountered. Higher-order rules or instructions are used to determine the relative priority of different rules and resolve any contradiction.

Rule-modifying behaviour builds on this level of autonomy, by making it possible to ‘learn’ how and when different rules should be applied. In practice, this means that priority between different rules is decided using relative weightings rather than absolute definitions, and that these weightings can be modified over time, depending on the quality of the results obtained. Neither rule-bound nor rule-modifying behaviour poses any fundamental problems in terms of automation.

Rule-discovering behaviour, in addition, allows the existing body of rules to be extended in the light of previously unknown regularities which are encountered in practice (“it turns out that many Xs are also Y; when looking for Xs, it is appropriate to extend the search to include Ys”). This level of autonomy — combining rule observance with reflexive feedback — is fairly difficult to envisage in the context of artificial intelligence, but not impossible.

The level of autonomy assumed by human agents, however, is still higher, consisting of rule-interpreting behaviour. Rule-discovery allows us to develop an internalised body of rules which corresponds ever more closely to the shape of the data surrounding us. Rule-interpreting behaviour, however, enables us to continually and provisionally reshape that body of rules, highlighting or downgrading particular rules according to the demands of different situations. This is the type of behaviour which tells us whether a ban is worth challenging, whether a sales pitch is to be taken literally, whether a supplier is worth doing business with, whether a survey’s results are likely to be useful to us. This, in short, is the level of Shirky’s situational “shared context” — and of the final turtle.

We believe that there is a genuine semantic gap between the visions of Semantic Web advocates and the most basic applications of rule-interpreting human intelligence. Situational information is always local, experiential and contingent; consequently, the data of the social sciences require interpretation as well as measurement. Any purely technical solution to the problem of matching one body of social data to another is liable to suppress or exclude much of the information which makes it valuable.

We cannot endorse comments from e-social science advocates such as this:

variable A and variable B might both be tagged as indicating the sex of the respondent where sex of the respondent is a well defined concept in a separate classification. If Grid-hosted datasets were to be tagged according to an agreed classification of social science concepts this would make the identification of comparable resources extremely easy.
(Keith Cole et al)

Or this:

work has been undertaken to assert the meaning of Web resources in a common data model (RDF) using consensually agreed ontologies expressed in a common language [...] Efforts have concentrated on the languages and software infrastructure needed for the metadata and ontologies, and these technologies are ready to be adopted.
(Carole Goble and David de Roure; emphasis added)

Statements like these suggest that semantics are being treated as a technical or administrative matter, rather than a problem in its own right; in short, that meaning is being treated as an add-on.

3. Google with Craig

To clarify these reservations, let’s look at a ‘semantic’ success story.

The service, called “Craigslist-GoogleMaps combo site” by its creator, Paul Rademacher, marries the innovative Google Maps interface with the classifieds of Craigslist to produce what is an amazing look into the properties available for rent or purchase in a given area. [...] This is the future….this is exactly the type of thing that the Semantic Web promised
(Joshua Porter)

‘This’ is is an application which calculates the location of properties advertised on the ‘Craigslist’ site and then displays them on a map generated from Google Maps. In other words, it takes two sources of public-domain information and matches them up, automatically and reliably.

That’s certainly intelligent. But it’s also highly specialised, and there are reasons to be sceptical about how far this approach can be generalised. On one hand, the geographical base of the application obviates the issue of granularity. Granularity is the question of the ‘level’ at which an observation is taken: a town, an age cohort, a household, a family, an individual? a longitudinal study, a series of observations, a single survey? These issues are less problematic in a geographical context: in geography, nobody asks what the meaning of ‘is’ is. A parliamentary constituency; a census enumeration district; a health authority area; the distribution area of a free newspaper; a parliamentary constituency (1832 boundaries) — these are different ways of defining space, but they are all reducible to a collection of identifiable physical locations. Matching one to another, as in the CONVERTGRID application (Keith Cole et al) — or mapping any one onto a uniform geographical representation — is a finite and rule-bound task. At this level, geography is a physical rather than a social science.

The issue of trust is also potentially problematic. The Craigslist element of the Rademacher application brings the social element to bear, but does so in a way which minimises the risks of error (unintentional or intentional). There is a twofold verification mechanism at work. On one hand, advertisers — particularly content-heavy advertisers, like those who use the ‘classifieds’ and Craigslist — are motivated to provide a (reasonably) accurate description of what they are offering, and to use terms which match the terms used by would be buyers. On the other hand, offering living space over Craigslist is not like offering video games over eBay: Craigslist users are not likely to rely on the accuracy of listings, but will subject them to in-person verification. In many disciplines, there is no possibility of this kind of ‘real-world’ verification; nor is there necessarily any motivation for a writer to use researchers’ vocabularies, or conform to their standards of accuracy.

In practice, the issues of granularity and trust both pose problems for social science researchers using multiple data sources, as concepts, classifications and units differ between datasets. This is not just an accident that could have been prevented with more careful planning; it is inherent in the nature of social science concepts, which are often inextricably contingent on social practice and cannot unproblematically be recorded as ‘facts’. The broad range covered by a concept like ‘anti-social behaviour’ means that coming up with a single definition would be highly problematic — and would ultimately be counter-productive, as in practice the concept would continue to be used to cover a broad range. On the other hand, concepts such as ‘anti-social behaviour’ cannot simply be discarded, as they are clearly produced within real — and continuing — social practices.

The meaning of a concept like this — and consequently the meaning of a fact such as the recorded incidence of anti-social behaviour — cannot be established by rule-bound or even rule-discovering behaviour. The challenge is to record both social ‘facts’ and the circumstances of their production, tracing recorded data back to its underlying topic area; to the claims and interactions which produced the data; and to the associations and exclusions which were effectively written into it.

4. Even better than the real thing

As an approach to this problem, we propose a repository of content-oriented metadata on social science datasets. The repository will encompass two distinct types of classification. Firstly, those used within the sources themselves; following Barney Glaser, we refer to these as ‘In Vivo Concepts’. Secondly, those brought to the data by researchers (including ourselves); we refer to these as ‘Organising Concepts’. The repository will include:

• relationships between Organising Concepts
‘theft from the person’ is a type of ‘theft’

• associations between In-Vivo Concepts and data sources
the classification of ‘Mugging’ appears in ‘British Crime Survey 2003’

• relationships between In-Vivo Concepts
‘Snatch theft’ is a subtype of the classification of ‘Mugging’

• relationships between Organising Concepts and In-Vivo Concepts
the classification of ‘Snatch theft’ corresponds to the concept of ‘theft from the person’

The combination of these relationships will make it possible to represent, within a database structure, a statement such as

Sources of information on Theft from the person include editions of the British Crime Survey between 1996 and the present; headings under which it is recorded in this source include Snatch theft, which is a subtype of Mugging

The structure of the proposed repository has three significant features. Firstly, while the relationships between concepts are hierarchical, they are also multiple. In English law, the crime of Robbery implies assault (if there is no physical contact, the crime is recorded as Theft). The In-Vivo Concept of Robbery would therefore correspond both to the Organising Concept of Theft from the person and that of Personal violence. Since different sources may share categories but classify them differently, multiple relationships between In-Vivo Concepts will also be supported. Secondly, relationships between concepts will be meaningful: it will be possible to record that two concepts are associated as synonyms or antonyms, for example, as well as recording one as a sub-type of the other. Thirdly, the repository will not be delivered as an immutable finished product, but as an open and extensible framework. We shall investigate ways to enable qualified users to modify both the developed hierarchy of Organising Concepts and the relationships between these and In-Vivo Concepts.

In the context of the earlier discussion of semantic processing and rule-governed behaviour, this repository will demonstrate the ubiquity of rule-interpreting behaviour in the social world by exposing and ‘freezing’ the data which it produces. In other words, the repository will encode shifting patterns of correspondence, equivalence, negation and exclusion, demonstrating how the apparently rule-bound process of constructing meaning is continually determined by ‘shared context’.

The repository will thus expose and map the ways in which social data is structured by patterns of situational information. The extensible and modifiable structure of the repository will facilitate further work along these lines: the further development of the repository will itself be an example of rule-interpreting behaviour. The repository will not — and cannot — provide a seamless technological bridge over the semantic gap; it can and will facilitate the work of bridging the gap, but without substituting for the role of applied human intelligence.

A mean idea to call my own

Technorati’s new “Filter by Authority” feature depresses me intensely – not least because I thought they’d abandoned the word ‘authority’ some time after my last rant on the subject. There are three problems here. Firstly, as I wrote last year:

Technorati is all about in-groups and out-groups. … authority directly tracks popularity – although this is ‘popularity’ in that odd American high-school sense of the word: ‘popular’ sites aren’t the ones with the most friends (most out-bound links, most distinct participants in Comments threads or even most traffic) but the ones with the most people envying them (hence: most in-bound links).

In other words, ‘authority’ is a really lousy synonym for ‘high inbound link count’, raising completely groundless expectations of quality and reliability. McDonald’s is a popular provider of hot food; it’s not an authority on cooking. The relative popularity (or enviability) of a site may signify many things, but it doesn’t signify that the site possesses absolute qualities like veracity, completeness, beauty – or authority.

But hold on – is it absurd to call McDonald’s authoritative? You’ve got to admit, they’re good at what they do… There’s a sense in which this is a tautology – because what they do is maximise the numbers who come through the doors – but never mind. Let’s say that we can identify the McDonald’s branch with the highest number of burgers sold (or repeat customers, or stars on uniforms – the precise metric doesn’t matter). There’s a good argument for using the word ‘best’: it looks like this is the best McDonald’s branch in the world. And the best fast food joint in the world? Well, maybe. The best restaurant in the world? Um, no. Quality tracks popularity, to some extent, but only within a given domain – otherwise USA Today would be the best newspaper in the USA . (To say it’s the best national mass-market tabloid would be less controversial.) [Edited with thanks to commenters who know about this stuff.]

This is the second problem with authority-as-link-count, and one which Technorati shows no sign of recognising, much less addressing. I can live with the idea that the Huffington Post is more popular than Beppe Grillo’s blog – but more authoritative? I really don’t think so. (Any right-wingers reading this may substitute Huffington for Grillo and Kos for Huffington, and re-read. And rest.) At bottom, Technorati’s ‘authority’ ranking is based on the laughably outdated idea that there is a single Blogosphere, within which we’re all talking to pretty much the same people about pretty much the same things. Abandon that assumption and the problems with an ‘authority’ metric are staringly apparent: who am I authoritative for? who am I more authoritative than?

But if this is an error it’s not an error of Dave Sifry’s invention. As I’ve said, within any given domain of ideas, it’s not entirely meaningless to say that authority tracks popularity: among academic authors, the author who sells books and fills halls is likely to be the author who is cited, even if he or she hasn’t written anything particularly inspired since Thatcher was in power. The question is whether this is a feature or a bug: if we’re going to read one writer rather than another, should we choose the popular dullard or the unknown genius? Put it another way: if we’re choosing who to read in the context of a new publication medium with massively lowered entry costs – and with an accompanying ideology rich in levelled playing-fields, smashed barriers and dismantled hierarchies – who should we be trying to seek out: Dullard (Popular) or Genius (Unknown)?

The third and most fundamental problem with ranking by ‘authority’ is that it brings to the Web one of the very features of offline life which Web evangelists told us we were leaving behind. This kind of ‘feature’ – and the buzz-chasing worldview that promotes it – is part of the problem, not part of the solution.

I find that it often helps me to also answer the question, “Who is the most influential blogger talking about XXX this week, and what did she say?”Dave Sifry

We climbed and we climbed

I don’t trust Yahoo!, for reasons which have nothing to do with my dislike of misused punctuation marks (although the bang certainly doesn’t help); I don’t trust Google either. Maybe it’s because I’m old enough to remember when MicroSoft [sic] were new and exciting and a major attractor of geek goodwill; maybe it’s just because I’m an incurable pinko and don’t trust anyone who’s making a profit out of me. Anyway, I don’t trust Yahoo!, or like them particularly; I switched to Simpy when Yahoo! bought del.icio.us, and I’ve felt a bit differently about Tom – hitherto one of my favourite bloggers anywhere – since he joined Yahoo!.

Still. This (PDF) is Tom’s presentation to the Future of Web Apps conference, and it’s good stuff – both useful and beautiful, to use William Morris’s criteria. The fourth rule (precept? guideline? maxim?) spoke to me particularly clearly:

Identify your first order objects and make them addressable

Start with the data, in other words; then work out what the data is; then make sure that people (and programs) can get at it. (Rule 5: “Use readable, reliable and hackable URLs”.) It’s a simple idea, but surprisingly radical when you consider its implications – and it’s already meeting resistance, as radical ideas do (see Guy Carberry’s comments here).

More or less in passing, Tom’s presentation also shows why the Shirkyan attempt to counterpose taxonomy to folksonomy is wrongheaded. If you’re going to let people play with your data (including conceptual data), then it needs to be exposed – but if you’re going to expose data in ways that people can get at, you need structure. And it doesn’t matter if it’s not the right structure, not least because there is no right structure (librarians have always known this); what matters is that it’s consistent and logical enough to give people a way in to what they want to find. To put it another way, what matters is that the structure is consistent and logical enough to represent a set of propositions about the data (or concepts). Once you’ve climbed that scaffolding, you can start slinging your own links. But ethnoclassification builds on classification: on its own, it won’t get you the stuff you’re looking for – unless what you’re looking for isn’t so much the stuff as what people are saying about stuff. (Which is why new-media journalists and researchers like tagging, of course.)

Anyway – very nice presentation by the man Coates. Check it out.

It’s just work

Suw Charman types too fast. She’s produced what looks like a fascinating record of the Future of Web Apps conference, but I can’t see myself ever reading the whole thing. But this jumped out at me (slight edits):

Joshua Schachter – The things we’ve learned
Tagging is not really about classification or organisation, it’s user interface. It’s a way to store your working state or context. Useful for recall. OK for discovery because someone might tag similarly to you. Bad for distribution.Not all metadata is tags. People ask for automatic metadata, but that’s not the value – the value is attention, that you saw it and decided that it was important enough to tag. Auto-tagging doesn’t help you do what you’re trying to do. … because there’s a small transaction cost that adds value. But don’t make them do too much work.

the value is attention … because there’s a small transaction cost, that adds value The value of tagging is in the meaning it encodes, and the meaning is created by people doing a bit of work. If you make things easy by automating the process of getting meaning out of data, that creativity is not called upon and what you get doesn’t have the same value.

This parallels my thoughts about the impoverishment of technology through the collapse of alternative ways of using it, often in the name of ease of use – not to mention the thoughts I put down on my other blog about how the best communication (and the best narrative) is gappy and open to multiple interpretations. One way of understanding why gappiness and plurivalence might be a positive virtue, finally, is suggested by Anne, who counterposes predictability and foretelling to potentiality and hope.

I think what all these arguments have in common is a sense of meaning as not-yet-(finally)-constructed. In this perspective the point of social software, in particular, is not to connect data but to enable people to talk about data – while preventing that talk from being entirely weightless by imposing a certain level of friction, a certain opportunity cost. (A cost which can always be raised or lowered. Thought experiment: Wikipedia makes it impossible to revert an article to a version less than a week old. What happens?) In the case of tagging systems, there has to be a reason why you would want to tag a resource, and want to tag it in ways that have meaning for you. Meaning is created through conversations that require a bit of effort, within the shared context of an open horizon: it’s work, but it’s work without a known outcome. A journey of hope, as someone wrote.

(My blogs are crossing over – I hate it when that happens…)

When the sweet turns sour

I’ve! just! exported! my! bookmarks! and! deleted! my! account!.
- Sophie, in comments at Burningbird

If this keeps up all the “Web 2.0″ blog nerds will be working at Yahoo! by next month.
- Jake
Yup! I think that’s the plan!
- Tom
(comments at plasticbag.org)

delicious was not only a community. It was also an experiment. A place for us geeks to meet and discuss. A place where we were changing the Web. Yes, WE were changing the Web through our ideas. And Joshua was good in picking the best ideas. Inviting us to give more. Now do you really think this will continue under Yahoo!’s reign?
- Pietro

Some lessons to learn here:
1. Never trust a startup service to store your important data no matter how the owner seems honest to you.
2. Never trust a corporate entity to continue storing your important data.
3. Never act like a fanboy on services you don’t trust.
- Ronald Johnson, in comments at the del.icio.us blog

Companies offer web services to get free ideas, exploit free R&D, and discover promising talent. They offer the APIs so people can build clever toys, the best of which the company will grab — thank you very much — and develop further on their own. There is no business model for mashups. If Web 2.0 really is just mashups, this is going to be one short revolution.
- Greg

This enthusiasm for big business – as long as it’s a cool big business – strikes me as both dangerous and weird, not to mention being the antithesis of what’s made the Net fun to work with all these years. But it is a logical development of one branch of the ‘Web 2.0′ hype – an increasingly dominant branch, unfortunately.
- me (on Google)

I promise not to be successful if you all give me money.
- Shelley

Update: I’ve switched to Simpy. It’s great.

This is the new stuff

Thomas criticises Wikipedia’s entry on folksonomy – a term which was coined just over a year ago by, er, Thomas. As of today’s date, the many hands of Wikipedia say:

Folksonomy is a neologism for a practice of collaborative categorization using freely chosen keywords. More colloquially, this refers to a group of people cooperating spontaneously to organize information into categories, typically using categories or tags on pages, or semantic links with types that evolve without much central control. … In contrast to formal classification methods, this phenomenon typically only arises in non-hierarchical communities, such as public websites, as opposed to multi-level teams and hierarchical organization. An example is the way in which wikis organize information into lists, which tend to evolve in their inclusion and exclusion criteria informally over time.

Thomas:

Today, having seen an new academic endeavor related to folksonomy quoting the Wikipedia entry on folksonomy, I realize the definition of Folksonomy has become completely unglued from anything I recognize (yes, I did create the word to define something that was undefined prior). It is not collaborative, it is not putting things into categories, it is not related to taxonomy (more like the antithesis of a taxonomy), etc. The Wikipedia definition seems to have morphed into something that the people with Web 2.0 tagging tools can claim as something that can describe their tool

I’m resisting the temptation to send Thomas the All-Purpose Wikipedia Snark Letter (“Yeah? Well, if you don’t like the wisdom of the crowds, Mr So-Called Authority…”). In fact, I’m resisting the temptation to say anything about Wikipedia; that’s another discussion. But I do want to say something about the original conception of ‘folksonomy’, and about how it’s drifted.

Firstly, another quote from Thomas’s post from today:

Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one’s own retrival. The tagging is done in a social environment (shared and open to others). The act of tagging is done by the person consuming the information.

There is tremendous value that can be derived from this personal tagging when viewing it as a collective, when you have the three needed data points in a folksonomy tool: 1) the person tagging; 2) the object being tagged as its own entity; and 3) the tag being used on that object. … [by] keeping the three data elements you can use two of the elements to find a third element, which has value. If you know the object (in del.icio.us it is the web page being tagged) and the tag you can find other individuals who use the same tag on that object, which may lead (if a little more investigation) to somebody who has the same interest and vocabulary as you do. That person can become a filter for items on which they use that tag. You then know an individual and a tag combination to follow.

This is admirably clear and specific; it also fits rather well with the arguments I was making in two posts earlier this year:

[perhaps] the natural state of knowledge is to be ‘cloudy’, because it’s produced within continuing interactions within groups: knowledge is an emergent property of conversation, you could say … [This suggests that] every community has its own knowledge-cloud – that the production and maintenance of a knowledge-cloud is one way that a community defines itself.

If ‘cloudiness’ is a universal condition, del.icio.us and flickr and tag clouds and so forth don’t enable us to do anything new; what they are giving us is a live demonstration of how the social mind works. Which could be interesting, to put it mildly.

Thomas’s original conception of ‘folksonomy’ is quite close to my conception of a ‘knowledge cloud’: they’re both about the emergence of knowledge within a social interaction (a conversation).

The current Wikipedia version of ‘folksonomy’ is both fuzzier and more closely tied to existing technology. What’s happened seems to be a kind of vicious circle of hype and expectations management. It’s not a new phenomenon – anyone who’s been watching IT for any length of time has seen it happen at least once. (Not to worry anyone, but it happened quite a lot around 1999, as I remember…)

  1. There’s Vision: someone sees genuinely exciting new possibilities in some new technology and writes a paper on – oh, I don’t know, noetic telepresence or virtual speleology or network prosody…
  2. Then there’s Development: someone builds something that does, well, a bit of it. Quite significant steps towards supporting network prosody. More coming in the next release.
  3. Phase three is Hype. Hype, hype, hype. Mm-hmm. I just can’t get enough hype, can you?
  4. The penultimate phase is Dissemination: in which everyone’s trying to support network prosody. Or, at least, some of it. That stuff that those other people did with their tool. There we go, fully network prosody enabled – must get someone to do a writeup.
  5. Finally we’re into Hype II, also known as Marketing: ‘network prosody’ is defined less by the original vision than by the tools which have been built to support it. The twist is that it’s still being hyped in exactly the same way – tools which don’t actually do that much are being marketed as if they realised the original Vision. It’s a bit of a pain, this stage. Fortunately it doesn’t last forever. (Stage 6 is the Hangover.)

What’s to be done? As I said back here, personally I don’t use the term ‘folksonomy’; I prefer Peter Merholz’s term ‘ethnoclassification’. Two of my objections to ‘folksonomy’ were that it appears to denote an end result as well as a process, and that it’s become a term of (anti-librarian) advocacy as well as description; Thomas’s criticisms of Wikipedia seem to point in a similar direction. Where I do differ from Thomas is in the emphasis to be placed on online technologies. Ethnoclassification is – at least, as I see it – something that happens everywhere all the time: it’s an aspect of living in a human community, not an aspect of using the Web. If I’m right about where we are in the Great Cycle of Hype, this may soon be another point in its favour.

Everything playing at once

Dave:

I no longer look at the front page of the NY Times to tell me what’s important. I look at it to see what people like the editors of the NY Times think is important. I’m finding the news that matters through the Internet recommendation engine: Blogs, emails, mailing lists, my aggregator, websites that aggregate and comment on news, etc.

Brief thoughts (also appearing in comments at Dave’s): we’re back with finding out what people say about stuff. Which is, ultimately, all there is to find out. Knowledge – and, for that matter, news – has always been produced in cloud form, as an emergent property of conversations. When we counterpose knowledge to conversation, we’re really saying that certain conversations have ended – or been brought to an end – and left unchallenged conclusions behind them. What’s changed is that, until recently, the conversations which produce knowledge (and news) have taken place within small and closed groups, so that most of us have only seen the crystallised end-product of the conversation. What Wikipedia, blogging, RSS and del.icio.us give us is the rudiments of a distributed conversation platform, enabled by pervasive broadband. (Which is why the ownership of the authority to stop the conversation – and crystallise the cloud – is such a big issue.)

Know what I mean

Back here, I wrote:

Tagging, I’m suggesting, isn’t there to tell us about stuff: it’s there to tell us about what people say about stuff. As such, it performs rather poorly when you’re asking “where is X?” or “what is X?”, and it comes into its own when you’re asking “what are people saying about X?”

This relates back to my earlier argument that all knowledge is cloud-shaped, and that tagging is simply giving us a live demonstration of how the social mind works. In other words, all there is is “what people are saying about X” – but some conversations have been going on longer than others. Some conversations, in fact, have developed assumptions, artefacts, structures and systems within and around which the conversation has to take place. The conversation carried on in the medium of tagging isn’t at that stage yet, perhaps, but it will be – the interesting question is about the nature of those artefacts and structures.

Now (with thanks to Anne Galloway) over to Dan Sperber.

When say, vervet monkeys communicate among themselves, one vervet monkey might spot a leopard and emit an alarm cry that indicates to the other monkeys in his group that there’s a leopard around. The other vervet monkeys are informed by this alarm cry of the presence of a leopard, but they’re not particularly informed of the mental state of the communicator, and they don’t give a damn about it. The signal puts them in a cognitive state of knowledge about the presence of a leopard, similar to that of the communicating monkey — here you really have a smooth coding-decoding system.In the case of humans, when we speak we’re not interested per se in the meaning of the words, we register what the word means as a way to find out what the speaker means. Speaker’s meaning is what’s involved. Speaker’s meaning is a mental state of the speaker, an intention he or she has to share with us some content. Human communication is based on the ability we have to attribute mental state to others, to want to change the mental states of others, and to accept that others change ours.

When I communicate with you I am trying to change your mind. I am trying to act on your mental state. I’m not just putting out a kind of signal for you to decode. And I do that by providing you with evidence of a mental state in which I want to put you in and evidence of my intention to do so. The role of what is often known in cognitive science as “theory of mind,” that is the uniquely human ability to attribute complex mental states to others, is as much a basis of human communication as is language itself.

I am full of admiration for the mathematical theory of information and communication, the work of Shannon, Weaver, and others, and it does give a kind of very general conceptual framework which we might take advantage of. But if you apply it directly to human communication, what you get is a mistaken picture, because the general model of communication you find is a coding-decoding model of communication, as opposed to this more constructive and inferential form of communication which involves inferring the mental state of others, and that’s really characteristic of humans.
[...]
For Dawkins, you can take the Darwinian model of selection and apply it almost as is to culture. Why? Because the basic idea is that, just as genes are replicators, bits of culture that Dawkins called “memes” are replicators too. If you take the case of population genetics, the causal mechanisms involved split into two subsets. You have the genes, which are extremely reliable mechanisms of replication. On the other hand, you have a great variety of environmental factors — including organisms which are both expression of genes and part of their environment — environmental factors that affect the relative reproductive success of the genes. You have then on one side this extremely robust replication mechanism, and on the other side a huge variety of other factors that make these competing replication devices more or less successful. Translate this into the cultural domain, and you’ll view memes, bits of culture, as again very strong replication devices, and all the other factors, historical, ecological, and so on, as contributing to the relative success of the memes.

What I’m denying, and I’ve mentioned this before, is that there is a basis for a strong replication mechanism either in cognition or in communication. It’s much weaker than that. As I said, preservative processes are always partly constructive processes. When they don’t replicate, this does not mean that they make an error of copying. Their goal is not to copy. There are transformation in the process of transmission all the time, and also in the process of remembering and retrieving past, stored information, and these transformations are part of the efficient working of these mechanisms. In the case of cultural evolution, this yields a kind of paradox. On the one hand, of course, we have macro cultural stability — we do see the same dish being cooked, the same ideologies being adopted, the same words being used, the same song being sung. Without some relatively high degree of cultural stability — which was even exaggerated in classical anthropology — the very notion of culture wouldn’t make sense.

How then do we reconcile this relative macro stability at the cultural level, with a lack of fidelity at the micro level? … The answer, I believe, is linked precisely to the fact that in human, transmission is achieved not just by replication, but also by construction. … Although indeed when things get transmitted they tend to vary with each episode of transmission, these variations tend to gravitate around what I call “cultural attractors”, which are, if you look at the dynamics of cultural transmission, points or regions in the space of possibilities, towards which transformations tend to go. The stability of cultural phenomena is not provided by a robust mechanism of replication. It’s given in part, yes, by a mechanism of preservation which is not very robust, not very faithful (and it’s not its goal to be so). And it’s given in part by a strong tendency for the construction — in every mind at every moment — of new ideas, new uses of words, new artifacts, new behaviors, to go not in a random direction, but towards attractors. And, by the way, these cultural attractors themselves have a history.

There’s more – much more – but what I’ve quoted brings out two key points. Firstly, communication is not replication: in conversation, there is no smooth transmission of information from speaker to listener, but a continuing collaborative effort to present, construct, re-present and reconstruct shared mental models. The overlap between this and the ‘knowledge cloud’ model is evident. Secondly, construction has a context: the process of model-building (or ‘thinking’ as we scientists sometimes call it) is always creative, always innovative, and always framed by pre-existing cultural ‘attractors’. And these cultural attractors themselves have a history – you could say that people make their own mental history, but they do not do so in circumstances of their own choosing…

This is tremendously powerful stuff – from my (admittedly idiosyncratic) philosophical standpoint it suggests a bridge between Schutz, Merleau-Ponty and Bourdieu (and I’ve been looking for one of those for ages). My only reservation relates to Sperber’s stress on speaker’s meaning … a mental state of the speaker. I think it would enhance Sperber’s model, rather than marring it, to focus on mental models as they are constructed within communication rather than as they exist within the speaker’s skull – in other words, to bracket the existence of mental states external to communicative social experience. On this point Schutz converges, oddly, with Wittgenstein.

Sperber’s argument tends to underpin my intuition on tagging and knowledge clouds: if all communication is constructive – if there is no simple transmission or replication of information – then conversation really is where knowledge develops, or more precisely where knowledge resides. Sperber also helps explain the process by which some conversations become better-established than others; we can see this as a feedback process, involving the development of a domain-specific set of ‘attractors’. These would perhaps serve as a version of Rorty’s ‘final vocabulary’: a shared and unquestionable set of assumptions, a domain-specific backdrop without which the conversation would make no sense.

One final thought from Sperber:

The idea of God isn’t a supernatural idea. If the idea of God were supernatural, then religion would be true.

Well, I liked it.

If I drew a detailed map

Several months ago, I wrote (regarding the Wikipedia page on ‘anomie‘):

For what I’d want to know about a concept like that, that page is pretty dreadful. It veers wildly between essentialism (there is a thing called ‘anomie’ and we know what it is, across time and space) and nominalism (different people have used this combination of letters to mean different things, who knew?). What’s not there is any sense of the history of the concept

I was reminded of this argument by Tom‘s recent comments on the ‘penis envy’ page (“I know this article on penis envy is bullshit, and it’s been on my ‘to do’ list of things to fix for weeks, and I’ve got nowhere“). The problem here is that making things more complicated is a lot harder than keeping them simple. What’s worse, the kind of people who are critical of other people’s simplifications tend also to be critical of their own work, which means that getting the complicated version written and getting it right is a long and painstaking job. Which, in turn, means that in the absence of serious incentives it’s quite likely not to get done. Wikipedia’s native system of informal incentives breaks down, in other words, where the workload gets too large – and, when it comes to making things more complicated (and getting it right), the workload starts at ‘large’ and goes up.

I was talking about this stuff with a friend the other day (hi Chris!) when he came up with a proposal for filling the incentive gap. The idea is to mobilise peer pressure among the population of disgruntled complexifiers. What we want isn’t so much an army of subject experts as a group of people who mistrust simple explanations and are good at digging out and writing down the underlying complications, in any of a number of fields. Hacks rather than professors, essentially – but good hacks. A list of apparently oversimplified Wikipedia articles could then be drawn up, and each one could be offered to names picked from the pool. I’ll just reiterate that I’m not talking about people with expert knowledge, so much as perfectionists with inquiring minds. The Wikipedia articles I’ve mentioned left me with a stack of unanswered questions, which I’d happily devote a few evenings to answering if I was being paid to do so – or if I had any incentive to do so. A virtual tap on the shoulder from an online group of pedantic curmudgeons might just do the job.

That just leaves the task of assembling the group. Here, Chris made the brilliant suggestion of using PledgeBank. Something like this:

I will take part in a group of volunteers who will improve Wikipedia by correcting and extending inaccurate and simplistic entries on social science concepts, but only if another 99 people do so too.

I think it could work. What do you think?

A place for everything

Or: what ethnoclassification is, and what folksonomy isn’t.

When it comes to tagging, I’m facing both ways. I think it’s fascinating and powerful and new – qualitatively new, that is: it’s worth writing about not just because it’s shiny, but because there’s still work to be done on understanding it. At the same time, I think it’s been massively oversold, often on the back of rhetorical framings which only have a glancing relationship with evidence or logic. Tagging is fascinating and powerful and new, but a lot of the talk about tagging has me tearing my hair.

I’ll pick on a recent post by Dave Weinberger. (Personal to DW: sorry, Dave. I’m emphatically not (is that emphatic enough?) suggesting that you’re the worst offender in this area.)

Let’s say you type in “africa,” “agriculture” and “grains” because that’s what you’re researching. You’ll get lots of results, but you may miss pages about “couscous” because Google is searching for the word “grain” and doesn’t know that that’s what couscous is made of. Google knows the words on the pages, but doesn’t know what the pages are about. That’s much harder for computers because what something is about really depends on what you’re looking for. That same page on couscous that to you is about economics could be about healthy eating to me or about words that repeat syllables to someone else. And that’s the problem with all attempts by experts and authorities to come up with neat organizations of knowledge: What something is about depends on who’s looking.

Let’s say you come across the Moroccan couscous web page and you want to remember it. So you upload its Web address to your free page at del.icio.us that lists all the pages you’ve saved. Then del.icio.us asks you to enter a word or two as tags so you can find the Moroccan page later. You might tag it with Morocco, recipe, couscous, and main course, and then later you can see all the pages you’ve tagged with any of those words.That’s a handy way to organize a large list of pages, but tagging at del.icio.us really took off because it’s a social activity: Everyone can see all the pages anyone has tagged with say, Morocco or main course or agriculture. This is a great research tool because just by checking the tag “agriculture” now and then, you’ll see every page everyone else at delicious has tagged that way. Some of those pages will be irrelevant to you, of course, but many won’t be. It’s like having the world of people who care about a topic tell you everything they’ve found of interest. And unlike at Google, you’ll find the pages that other humans have decided are ABOUT your topic.

What strikes me about this passage is that Dave changes scenarios in mid-stream: Let’s say you come across the Moroccan couscous web page… How? Google couldn’t find it. Let’s compare like with like, and say that you’re still looking for your couscous page: what do you do then, if not go to del.icio.us and type in “africa,” “agriculture” and “grains”? Once again, assuming that whole-site searches aren’t timing out, you’ll get lots of results (particularly since del.icio.us doesn’t seem to allow ANDing of search terms) but you may miss pages about “couscous” – and checking the tag “agriculture” now and then won’t necessarily help. Google will miss the page if the term ‘couscous’ doesn’t appear in the source (which doesn’t necessarily mean ‘appear on screen’, of course); del.icio.us will miss it if the term hasn’t been used to tag it (even if it is in the source).

Google vs del.icio.us is an odd comparison, in other words, and it’s not at all clear to me that the comparison favours del.icio.us. It’s great to get classificatory(?) input from the users of a document, of course – as I said above, tagging is fascinating and powerful and new – but in terms of information retrieval it can only score over a full-text search if

1. the page has been purposefully tagged by a user
2. the page has been tagged with a term which doesn’t appear in the page source
3. a second user is searching for information which is contained in the page, using the term with which the first user tagged it

I don’t think tagging advocates think enough about what those conditions imply. For example, at present I’m the only del.icio.us user to have tagged Mr Chichimichi’s Tags are not a panacea; I tagged it with ‘tagging’, ‘search’ and ‘ethnoclassification’. Until I did so, anyone looking for it would have been out of luck. Even Google wouldn’t be much help – the word ‘ethnoclassification’ doesn’t appear anywhere in the text. No, until a couple of days ago your only way of stumbling on that post would have been to run a clumsy, counter-intuitive Google search on terms like ‘tagging’, ‘tags’, ‘folksonomies’ and ‘social software’. (Google even knows that ‘folksonomies’ is the plural of ‘folksonomy’, so searching on the singular form would work just as well. That’s just not fair.)

Dave also contrasts the world of collective knowledge through distributed tagging with attempts by experts and authorities to come up with neat organizations of knowledge. Further along in the same piece, he writes:

This takes classification and about-ness out of the hands of authors and experts. Now it’s up to us readers to decide what something is about.Not only does this let us organize stuff in ways that make more sense to us, but we no longer have to act as if there’s only one right way of understanding everything, or that authors and other authorities are the best judges of what things are about.

One question: who ever said that there was only one right way of understanding everything? OK, too easy. I’ll rephrase that: before tagging came along, who was saying there was one right way, etc? Who are the tagging advocates actually arguing against? (It certainly isn’t librarians (context here).)

There’s a difference between classifications which have a single pre-determined set of definitions and classifications which are user-defined and user-extensible. But that’s not the same as the difference between having an underlying ontology and not having one, or the difference between hierarchical and flat organisations of knowledge, or the difference between single and multiple sets of classifications. A closed, expert-defined, locked-down controlled vocabulary may contain multiple sets of overlapping terms; it may be a flat list of categories rather than a ‘tree’; it may even be innocent of ontology. (Thanks to Jay for pointing this out, in comments here.) If tagging is better than top-down classification, it’s better because it’s user-defined and user-extensible – not because it’s free of the vices of ontology, hierarchy and uniformity. The idea that tagging – and only tagging – stands in opposition to a classifying universe built on hierarchical uniformity is a straw man. (But the librarians get it both ways – if a top-down classifying system is shown to be flat and plural, this can be put forward as a sign of the weakness of top-down systems; the fact that bottom-up systems are more, not less, vulnerable to Chinese Encyclopedia Syndrome is passed over.)

So, tagging systems make lousy search engines, and they don’t mark a qualitative leap in the organisation of human knowledge. What they’re really good for – and what makes them fascinating and powerful – is conversation. Tagging, I’m suggesting, isn’t there to tell us about stuff: it’s there to tell us about what people say about stuff. As such, it performs rather poorly when you’re asking “where is X?” or “what is X?”, and it comes into its own when you’re asking “what are people saying about X?” (Of course, much tag-advocacy is driven by the tacit belief that there’s no fundamental difference between what people say about X and expert knowledge of X – and that an aggregate of what people say would be equivalent, if not superior, to expert knowledge. But that’s an argument for another post.)

Tagging is good for telling us what people say about stuff, anyway – and when it’s good, it’s very good. To see what I’m talking about, have a look at Reader2 (via Thomas). It’s a book recommendation site, implemented on the basis of a del.icio.us-like user/tag system. It’s powerful stuff already, and it’s still being developed. Does it tell me what books are really like? No – but it tells me what people are saying about them, which is precisely what I want to know. And it couldn’t do this nearly as well, it seems to me, without tags – and tag clouds in particular. This, for me, is what tagging’s all about. Ethnoclassification: classification as a open-ended collective activity, as one element of the continual construction of social reality.

Who took the money?

This is a fascinating post (in Italian) by Pietro Speroni on the relationship between authority, communities and markets. This is an interesting and controversial area; the fact that Pietro also invokes the Long Tail (which, as you’ll recall, is not what it seems) makes it all the more compelling (to me at least).

I’ll translate as I go along; hopefully Pietro will correct me if I go wrong.

I don’t believe that the ruling class has vanished. I believe that it has simply been transformed – just as the world itself is being continually transformed from day to day. Decades ago, our world was simpler – more homogeneous, less diverse. If you followed a martial art, it would be judo or karate. A game? Chess. A religion? Christian, Jewish, perhaps Muslim at the outside.

on the Net, via Google (and wikipedia), you can find the specific branch of the specific religious tradition which best meets your needs. … And this is not true only of religions, but of everything: interests, political groups, passions, games, ways of life.

Now, every one of these groups has its own implicit hierarchy. … And everyone is a member of more than one group. And in every group you listen to some people, and what you say influences other people.

[In every area of my life] I have leaders: people I trust; people who I admire and learn from. But they’re not the same people as your leaders. Not only that, but there are other people who come to me to learn (worse luck for them!), in some fields more than in others. The process of diversification tends towards having as many groups as people – and every one of us, of necessity, becomes the small-scale leader of a small-scale group, scattered around the world.

This whole process mirrors what’s happening in the economy, where a market consisting of niches is growing explosively … The key phrase is Long Tail.

So I don’t believe that the ruling class is vanishing, but that we’re seeing a gradual diversification of interests, which leads to the diversification of the ruling class – accompanied by the redefinition and contraction [ridimensionamento] of the role of traditional leaders.

There’s a lot that I like about this – I think Pietro’s right to say that there’s a new kind of process of diversification under way, and to trace it back to the Internet’s basic sociality, its nature as a medium for conversation.

But… a transformation of the ruling class? Non tanto. Pietro’s larger argument is undermined by a couple of strange elisions. Firstly, it’s true that we all have multiple ‘authorities’ – the topics of folk music, statistics, Belgian beer and operaismo are all important to me, for instance, and in each case I could name an authority I’d willingly defer to. But those people aren’t the people who enforce the laws I obey, or set the level of tax I pay, or price the goods I buy, or write the newspapers I read, or appear on the news programmes I watch. The ruling class, it seems to me, is still very much in place, and whether I’m a tequila-crazed Quaker or a tea-drinking Tantric Buddhist is a matter of sublime indifference to it. Roy Bhaskar has written that historical materialists, by virtue of starting from the material facts of social existence, cannot propose absolute freedom, “a realm free of determination”; what we can envisage is moving “from unneeded, unwanted and oppressive to needed, wanted and empowering sources of determination”. The world Pietro describes is a world which is governed only by those needed, wanted and empowering sources of determination. It sounds good, but I don’t think we’re there yet.

Secondly, on the matter of niche marketing. Pietro assumes that a proliferation of niche markets will lead to a proliferation of niche suppliers, and hence the dilution of the authority of the big suppliers. I don’t see any reason to believe that this is the case. Indeed, one of Chris Anderson’s own preferred examples is based on Amazon sales rank – and there’s nothing very diffuse about Amazon, or the authority wielded by Amazon. Much of the buzz around the ‘Long Tail’ seems to derive, ultimately, from this confusion of the two meanings of ‘niche’. Clearly, mining niche markets can be profitable, if you’re a monopolistic behemoth like Amazon; but, equally clearly, it doesn’t follow that niche suppliers can make a living in the same way. Indeed, making niches visible to companies like Amazon actually threatens existing niche suppliers. (Ask your local bookshop, if you’ve still got one.)

Of course, Long Tail proponents tell a different story. Back in July, Scott Kirsner quoted George Gilder thus:

His central thesis is that Internet-connected screens in the home – whether it’s the PC in your den or the plasma screen on your living room wall – are going to change the way we consume video by offering us infinite choice.

“The film business will increasingly resemble the book business,” he says, with a few best-sellers that achieve widespread popularity, and lots of publishers making a profit selling titles that no one’s ever heard of.

Lots of who doing what? Run that past us again, could you? While you’re at it, send the good news to the novelist A.L. Kennedy, whose wonderful FAQ includes this:

SO, WHAT’S HAPPENING WITH THE WONDERFUL WORLD OF PUBLISHING?Fewer publishing houses concentrated in conglomerate hands, trying to produce more books of less quality. No full time readers, no full time copy editors and therefore missed newcomers and pisspoor final presentation of texts on the shelves, silly covers, greedy and simple-minded bookshop chains, lunatic bidding wars designed to crush the spirit of unknown newcomers, celebrity “tighten your buns and nurture your inner pot plant” hard backs and much related insanity.

Mass markets are where the units get shifted; niche markets – like literary fiction – are where survivors linger on (until they’re bought out) and upstart competitors emerge (and hang on until they’re bought out). It’s the logic of the monopoly, which is to say that it’s the logic of the market. Some years ago a McDonald’s spokesman, asked if the fast food market had reached saturation point, responded that, as far as his company was concerned, the market would only be saturated if there were no cooked food outlets anywhere on the planet apart from McDonald’s. I don’t think Amazon, or the publishing conglomerates, or the media companies who would source Gilder’s ‘infinite choice’, think any differently.

But Pietro’s half right: there is something interesting going on, even if it doesn’t mirror what’s going on in the economy; there is a process of diffusion and diversification, even if it doesn’t affect the main sources of authority over our lives. In fact, what’s significant about the Net is that it can host conversations which escape the marketplace and evade pre-existing (‘unneeded and unwanted’) forms of authority. That said, it can also reproduce the marketplace and reinvent old forms of authority – just like other conversational media.

In short, what’s good about the Web is – or can be – very good; what’s bad about is – or should be – very familiar.

So say I

Why I use ‘ethnoclassification’ rather than ‘folksonomy’.

  1. ‘Ethnoclassification’ recalls ‘ethnomethodology’, Harold Garfinkel’s coinage for the study of the collective construction of everyday life. Garfinkel took a great deal from Alfred Schutz; I think some of his work develops Schutz’s social phenomenology in the wrong direction, but to have Schutz’s work developed at all is a good thing. In this context, the term ‘ethnoclassification’ suggests a process that’s continual, provisional and embedded in practical activity: the place where it happens (to borrow a phrase from Russell Hoban) is Everywhere All The Time. I think this is a good emphasis.
  2. ‘Folksonomy’, by contrast, suggests both a process and the end result (a viable folk-taxonomy); as such it’s confusing and promotes fuzzy argument.
  3. It’s also a term with a strong positive value: forward the taxonomy of the folk! a bas les bibliothecaires! It’s a marketing term as well as a term of analysis, and lends itself to slippage between description and advocacy.
  4. (Last and least) It’s etymologically ghastly and obtrusively American (I don’t say ‘candy’, I don’t say ‘diaper’ and I don’t say ‘folks’).

Henceforth – starting in the previous post, to be more precise – I’ll be using ‘ethnoclassification’ to refer to the (real, universal, continuing) process and ‘folksonomy’ to refer to the (hyped, unrealised, arguably unrealisable) end result.

Not available before

Thanks to a couple of links posted by Thomas, I’ve just read Bryan Boyer’s Correspondance Romano (Corriere Romano, surely? never mind) closely followed by this post from February by Tom Evslin. Tom:

People don’t think hierarchically – at least most people don’t. We think in terms of associations. Our dreams give this away as they hyperlink through experiences of the day and memories of the distant past. A conversation meanders horizontally from one topic to the next.

Hierarchies like Lotus Notes or the Dewey Decimal System were necessary when computing power was non-existent or very expensive. As computing power has become relentlessly cheaper thanks to Moore’s law, hierarchies of information have become unnecessary. … So long as Google or its competitors can index almost everything I might ever want to find, why should any arbitrary order be imposed on information?

Once we didn’t need hierarchies to organize our approach to information, they became an impediment. It is very hard for one person to figure out which node in which folder tree another person would have put a particular piece of information. A document may be relevant to one researcher for entirely different reasons than it is relevant to another researcher.

The relationship between documents is actually dynamic depending on the needs of the reader. Not incidentally, open tagging and hyperlinking are both ways to impose particular relationships on documents to meet the need of some subset of readers.

In passing, this suggests that the contribution of tagging to the grunt work of actually finding stuff may not be all that significant. After all, “a document may be relevant to one researcher for entirely different reasons than it is relevant to another researcher”: in this respect the same strictures apply to tags as to folders, with the proviso that tagging does at least give you multiple chances to get it right. I’ve found useful and interesting stuff by browsing del.icio.us, but I’ve also found useful and interesting stuff by browsing library catalogues, running partial name searches on booksellers’ sites, googling common phrases and going to the eighth page of results, and so forth. But then, I’m a catalogue-hound and I like being surprised. If you’re looking for something specific, Tom’s argument (inadvertently?) suggests, you’re probably better off with Google.

Bryan’s post doesn’t discuss taxonomies, ontologies or search engines, largely because it’s a series of emails from 2002. But it does contain this beautiful piece of ethnoclassification:

Italy is about all of these things: cured meats, standing up to drink your coffee, stiffling heat, mid-day naps, skulls in churches, hot men in suits on scooters, Ananas, and cheap groceries.

This is very much the kind of freewheeling associational approach to knowledge that Tom describes – and very much the kind of ground-up, non-exclusive, plural, open-ended classifying process which has become known as ‘folksonomy’.

But what happens if we take that sentence and map it onto the current ‘folksonomic’ toolset? Is there an ‘Italy’ resource somewhere – a really really authoritative Web page, say – that we can tag with ‘curedmeat’, ‘coffeestandingup’, ‘stifflingheat’ and so on? (Never mind the problem of cross-matching with the tags ‘meat.cured’, ‘coffee.standing’ and ‘heat.stiffling’ – let alone ‘heat.stifling’.) Or are we going to use an ‘italy’ tag and apply it to single identifiable resources on ‘cured meat’, ‘hot men in suits on scooters’, etc? If so, did all those resources exist before we tried to tag them – and if not, are we going to have to create them?

The kind of association described by Tom – and exemplified by Bryan’s old mails – is actually a very bad fit for the Technorati/del.icio.us style of document tagging, for two reasons. One is that it’s two-way: if ‘Italy’ is associated with ‘skulls in churches’ then ‘skulls in churches’ is necessarily associated with ‘Italy’. (In the case of document-based tagging, the relationship is asymmetrical and the inverse relationship is weaker: Document 1 ‘is about’ T1, T2, T3; Topic 1 ‘has some relevant information in’ D1, D2, D3.) The other is that it’s descriptive rather than annotative: we’re not tagging stuff-about-stuff, we’re tagging… well, stuff, and tagging it with other stuff. These bi-directional relationships between concepts can be approximated by the associations between tags which emerge out of the cumulative process of document tagging, but this seems like going a very long way round. “We think in terms of associations”: should we have to say

this has been applied to resources which have also been classified as that

when what we want to say is

this is like that ?

There’s one glaring exception to this argument: Flickr. It’s easy to imagine an ‘italy’ photoset including images which were also tagged with ‘curedmeat’, ‘churchskull’ and so forth. Descriptive tagging, bi-directional associations, it’s all there – job done. This is deceptive, however. Flickr runs on discrete objects – individual images – and the relationships between Flickr tags really describe the images themselves, or at most the universe of Flickr images. If we didn’t have any images of stifling heat in Italy, that association wouldn’t exist; if we had three salami pictures and only one of a skull in a church, the ‘curedmeat’/’italy’ association would automatically be three times as strong as ‘churchskull’/italy’. Once again, we’d have to go to considerable lengths in order to represent the associations which Bryan effortlessly set out in 32 hastily-composed words.

Ethnoclassification: do we have the technology?

Tag tag tag

Tom Coates’ interesting post Two cultures of fauxonomies collide has been getting a lot of attention lately, mainly thanks to Dave. There’s a particularly interesting discussion running at Many-to-Many. The discussion has progressed quite rapidly, with several bright and articulate people pitching in to illustrate how Tom’s original insight can be developed. My problem is that I’m not sure what the discussion’s based on. For example, Emil Sotirov writes:

Seemingly, given the freedom of folksonomy, people tend to move from hierarchical “folder” modes of tag interpretation (one-to-many) towards more open “keyword” modes (many-to-many).

Keywords are flat, many-to-many, open; folders are hierarchical, one-to-many, closed. (In short, folders are bad, m’kay?) But what does this really mean? If I think that tags are ‘like’ keywords or that tags are ‘like’ folders, what difference does it actually make?

From Tom’s original piece:
Matt’s concept was quite close to the way tagging is used in del.icio.us – with an individual the only person who could tag their stuff and with an understanding that the act of tagging was kind of an act of filing. My understanding was heavily influenced by Flickr’s approach – which I think is radically different – you can tag other people’s photos for a start, and you’re clearly challenged to tag up a photo with any words that make sense to you. It’s less of a filing model than an annotative one.

Incidentally, “an individual the only person who could tag their stuff”? That’s Technorati rather than del.icio.us, surely?

But anyway – the main question is, what are you actually doing differently if you use a tag as an ‘annotative’ keyword rather than a ‘classifying’ folder? In either case, it seems to me, you’re pulling out a couple of characteristics of an object and using them to lay a trail back to it. The only real difference I can see is that you’d expect to have more ‘keywords’ than objects and fewer ‘folders’ than objects, but I can’t see how this changes the way you actually interact with the tags or the tag-holder services – or the objects, for that matter.

Perhaps I’m just not getting something – all enlightenment is welcome. But I suspect that, in practice, Flickr and del.icio.us and… er, all those other social tagging services… are converging on a model somewhere between ‘keyword’ and ‘folder’. The tag cloud is crucial here. Flickr may start by enabling you to “tag up a photo with any words that make sense to you”, but the tag cloud display “conceals the less popular [tags] and lets recurrence form emergent patterns” (as Tom notes here); it also prompts users to select from previously-used tags if possible. Conversely, the (more rudimentary) tag-cloud display in del.icio.us gives less-used tags more prominence than they had when they were left to scroll off the screen, prompting users to select more widely from previously-used tags. In effect, the tag cloud draws del.icio.us users away from big-tree-of-folders thinking, while also drawing flickr users away from the keyword-pebbledash approach.

[No, that wasn't my promised post about the Long Tail. (It doesn't exist, you know.) Yes, I will get round to it, some time.]

Cloud A-Z

David Gratton (found via Thomas) argues that all communities are communities of interest. He argues – I think correctly – that what appear to be, for example, professional, demographic or geographic ‘communities’ are created and maintained through shared interests and shared activity around those interests. Where those interests and that activity are lacking, what’s left isn’t a community but a statistical abstraction masquerading as reality. What’s particularly interesting about this is that it destroys the notion of a ‘virtual community’ as something new and interesting; insofar as it’s a community, a ‘virtual community’ is a community of interest, like all the others. Technology may facilitate the creation of communities which wouldn’t have been created before, but the community itself is nothing new.

Here are a few further thoughts (initially written as a comment on David’s site).

You could take it a bit further by saying that a community (of whatever flavour) is a process rather than an object or an achieved state – community is something that people produce and reproduce by doing stuff together (including talking). Once you’ve said that community only exists as the continuing aggregate of its members’ interactions, you can start asking questions about those interactions – how frequent are they? how are they structured: does everyone talk to a single central ‘hub’, does everyone talk to everyone, are there ‘daisy-chains’? do they produce or redistribute anything identifiable – the physical necessities of life, or money, or information, or social status – or is it all about sociality and shooting the breeze? The answers to questions like those would say a lot about the shape of the community, which in turn would enable us to ask some interesting questions about the impact of ‘virtuality’, and the conditions under which ‘virtual communities’ are more or less viable than their face-to-face counterparts.

An interesting aspect of this model of community is the significance of talk: conversation is the bedrock, the basis on which everything else happens. (I’m reluctant to call it a medium: partly because of the concerns Dave highlighted back here; partly, relatedly, because that suggests that conversation is always a carrier for a signal, that there is always something else going on. This, I think, is profoundly misleading. We’re social beings: a large part of what we do, how we live, is social interaction, more or less as an end in itself.) What the technologies we associate with ‘virtual community’ do is, essentially, to make it easier to spend more of the time talking to more people, albeit in some oddly formalised ways. Some of the interesting questions about ‘virtuality’ are questions about this strange pairing of talk overload and talk formalisation.

What does this have to do with clouds? One idea I’ve been playing with is that the natural state of knowledge is to be ‘cloudy’, because it’s produced within continuing interactions within groups: knowledge is an emergent property of conversation, you could say. What the argument about ‘community’ suggests is that every community has its own knowledge-cloud – that the production and maintenance of a knowledge-cloud is one way that a community defines itself. The question then is whether existing technologies enable communities to do anything useful with their ‘clouds’, or if the services are still too attenuated – or too overloaded.

(I’ll get back to the Long Tail soon, hopefully. It doesn’t exist, you know.)

Follow

Get every new post delivered to your Inbox.

Join 179 other followers

%d bloggers like this: