Category Archives: cloud

An eerie sight

David introduces a new feature at Librarything:

“tagmashes,” which are (in essence) searches on two or more tags. So, you could ask to see all the books tagged “france” and “wwii.” But the fact that you’re asking for that particular conjunction of tags indicates that those tags go together, at least in your mind and at least at this moment. Library turns that tagmash into a page with a persistent URL.

I like everything about this, apart from the horrible name. As somebody points out in comments, it’s not a new idea – a large part of David’s post could have been summed up in the words “Librarything have implemented faceted tagging“. But I think this is still something worth shouting about, for two reasons. Firstly, they have implemented it – it’s there now to be played with, even if it’s got a silly name. Secondly and more importantly, they’ve implemented ground-up faceted tagging: the facets are created by the act of searching for particular combinations of tags. At a stroke this addresses the disadvantages I identified in my post; rather than being imposed beforehand, the dimensions into which the tags are organised emerge from the ways people want to combine tags. Arguably, what Librarything have ended up with is something like a cross between faceted tagging and Flickr-style tag clusters (in which dimensions emerge from an aggregate of past searches).

What’s more, the ability to record an association between two tags addresses a question I raised way back here. If, to quote Tom Evslin, “we think in terms of associations” (rather than conceptual hierarchies); and if “the relationship between documents is actually dynamic … open tagging and hyperlinking are both ways to impose particular relationships on documents to meet the need of some subset of readers”; then it’s curious, to say the least, that it’s been so hard until now to use tagging to say this is like that (as distinct from this has frequently been applied to resources which have also been classified as that). From del.icio.us on, tagging has been a simple naming operation, hitching up things to names (stuff-for-classifying to tags), but not allowing any connection between those names. The implication is that the higher-order knowledge of what went with what would only emerge – could only emerge – from the aggregate of everyone else’s naming acts.

The ‘tagmash’ reminds us that (pace David) everything is not miscellaneous: yes, we think in associations and we apply our own labels and classifying schemes to the world, but as we do so we’re also connecting A to B and treating D as a sub-type of C. When we talk, we don’t just spray names around; we’re always adding a bit of structure to the conversational cloud, making a few more connections. It’s the connections, not the nodes, that map out the shape of a cloud of knowing.

Update Changed post title. I spent a good five minutes this afternoon thinking of an appropriate lyrical reference (eventually settling on this one); I don’t know how I missed the obvious choice.

The vagaries of science

The slightly oxymoronic Britannica Blog has recently hosted a series of posts on Web 2.0, together with responses from Clay Shirky, Andrew Keen and others. The debate’s been of very variable quality, on both the pro- and the anti- side; reading through it is a frustrating experience, not least because there’s some interesting stuff in among the strawman target practice (on both sides) and the tent-preaching (very much on both sides). As I said in response to a (related) David Weinberger post recently, it’s not always clear whether the pro-Web 2.0 camp are talking about how things are (what knowledge is like & how it works) or about how things are changing – or about how they’d like things to change. The result is that developments with the potential to be hugely valuable (like, say, Wikipedia) are written about as if they had already realised their potential, and attempts to point out flaws or collateral damage are dismissed as naysaying. On the anti- side, the danger is of an equally unthinking embrace of how things are – or how they were before all this damn change started happening.

All this is by way of background to some comments I left on danah boyd‘s contribution (which is well worth reading in full), and may explain (if not excuse) the impatient tone. danah, then me:

Why are we telling our students not to use Wikipedia rather than educating them about how Wikipedia works?

Because I could give a 20-credit course on ‘how Wikipedia works’ and not get to the bottom of it. It’s complex. It’s interesting. I happen to believe it’s an almighty mess, but it’s a very complex and interesting mess. For practical purposes “Don’t cite it” is quicker.

Wikipedia is not perfect. But why do purported experts spend so much time arguing against it rather than helping make it a better resource?

This is a false opposition: two different activities with different timescales, different skillsets and different rewards. I get an idea, I write it down – generally it won’t let me go until I’ve written it down. I look at what I’ve written down, and I want to rewrite it – quite often it won’t let me go until I’ve rewritten it. All of this takes slabs of time, but they’re slabs of time spent engrossed with ideas and language, my own and other people’s – and the result is a real and substantial contribution to a conversation, by an identifiable speaker.

I look at a bad Wikipedia article [link added] and I don’t know where to start. What I’d like to do is delete the whole thing and put in the stub of a decent article that I can come back to later, but I sense that this will be regarded as uncool. What I don’t want to do is clamber through the existing structure of an entry I think shouldn’t have been written in the first place correcting an error here or there, because that’s a long-drawn-out task that’s both tedious and unrewarding. And what I particularly don’t want to do is return to the article again and again over a period of weeks because my edits are getting reverted by someone hiding behind a pseudonym.

(I think what Wikipedia anonymity has shown, incidentally, is that people really don’t like anonymity. Wikipedia has produced its own stable identities – and its own authorities, based on the reputation particular Wikipedia editors have established within the Wikipedia community.)

Is it really worth that much prestige to write an encyclopedia article instead of writing a Wikipedia entry?

Well, yes. If I get a journal article accepted or I’m commissioned to write an encyclopedia article, I’m joining an established conversation among fellow experts. What I’ve written stays written and gets cited – in other words, it contributes to the conversation, and hence to the formation of the cloud of knowledge within the discipline. And it goes on my c.v. – because it can be retrieved as part of a reviewable body of work. If I write for Wikipedia I don’t know who I’m talking to, nobody else knows who’s writing, and what I’ve written can be unwritten at any moment. And it would look ridiculous on my c.v. – because they’ve only got my word that it is part of my body of work, assuming it still exists in the form in which I wrote it.

The way things are now, knowledge lives in domain-sized academic conversations, which are maintained by gatekeepers and authorities. Traditional encyclopedias make an effort to track those conversations, at least in their most recently crystallised (serialised?) form. Wikipedia is its own conversation with its own authorities and its own gatekeepers. For the latest state of the Wikipedia conversation to coincide with the conversation within an established domain of knowledge is a lucky fluke, not a working assumption.

Update The other big difference between traditional encyclopedias and Wikipedia (as someone known only as ‘bright’ reminded me, in comments over here) is that the latter gets much more use. From my response:

Comparisons with the Britannica are interesting as far as they go – and I don’t believe they do Wikipedia any favours – but they don’t address the way that Wikipedia is used, essentially as an extension of Google. When I google for information I’m not hoping to find an encyclopedia article. Generally, Britannica articles used to appear on the first page of hits, but not right at the top; usually you’d see a fan sites, hobby sites, school sites, scholarly articles and domain-specific reference works on the same page, and usually the fan sites, etc, would be just as good. (I stopped using the Britannica altogether as soon as it went paywalled.) If all that had happened was that Britannica results had been pushed down from number 8 to number 9, with their place being taken by Wikipedia, I doubt we’d be having this conversation. What’s happened is that, for topic after topic, Wikipedia is number 1; the people who would have run all those fan sites and hobby sites are either writing for Wikipedia instead or they’re not bothering, since after all Wikipedia is already there. (Or else the sites are still out there, but they’re way down the search result list because they’re not getting the traffic.) It’s a monoculture; it’s a single point of failure, in a way that encyclopedias aren’t. And it’s the last thing that should have happened on the Web. (I’ll own up to a lingering Net idealism. Internet 0.1, I think it was.)

Alright, yeah

Stephen Lewis (via Dave) has a good and troubling post about the limits of the Web as a repository of knowledge.

while the web might theoretically have the potential of providing more shelf space than all libraries combined, in reality it is quite far from being as well stocked. Indeed, only a small portion of the world’s knowledge is available online. The danger is that as people come to believe that the web is the be-all and end-all source of information, the less they will consult or be willing to pay for the off-line materials that continue to comprise the bulk of the world’s knowledge, intellectual achievement, and cultural heritage. The outcome: the active base of knowledge used by students, experts, and ordinary people will shrink as a limited volume of information, mostly culled from older secondary sources, is recycled and recombined over and again online, leading to an intellectual dark-age of sorts. In this scenario, Wikipedia entries will continue to grow uncontrolled and unverified while specialized books, scholarly journals and the world’s treasure troves of still-barely-explored primary sources will gather dust. Present-day librarians, experts in the mining of information and the guidance of researchers, will disappear. Scholarly discourse will slow to a crawl while the rest of us leave our misconceptions unquestioned and the gaps in our knowledge unfilled.

The challenge is either – or both – to get more books, periodicals, and original source materials online or to prompt people to return to libraries while at the same time ensuring that libraries remain (or become) accessible. Both tasks are dauntingly expensive and, in the end, must be paid for, whether through taxes, grants, memberships, donations, or market-level or publicly-subsidized fees.

Lewis goes on to talk about the destruction of the National and University Library in Sarajevo, among other things. Read the whole thing.

But what particularly struck me was the first comment below the post.

I think you’re undervaluing the new primary sources going up online, and you’re undervaluing the new connections that are possible which parchment can’t compete with like this post I’m making to you. I definitely agree that there is a ton of great knowledge stored up in books and other offline sources, but people solve problems with the information they have, and in many communities – especially rural third world communities, offline sources are just as unreachable, if not more, than online sources.

This is a textbook example of how enthusiasts deal with criticism. (I’m not going to name the commenter, because I’m not picking on him personally.) It’s a reaction I’ve seen a lot in debates around Wikipedia, but I’m sure it goes back a lot further. I call it the “your criticism may be valid but” approach – it starts by formally conceding the criticism, thus avoiding the need to refute or even address it. Counter-arguments can then be deployed at will, giving the rhetorical effect of debate without necessarily addressing the original point. It’s a very persuasive style of argument.In this case there are three main strategies. The criticism may be valid…

I think you’re undervaluing the new primary sources going up online

but (#1) things are getting better all the time, and soon it won’t be valid any more! (This is a very common argument among ‘social software’ fans. Say something critical about Wikipedia on a public forum, then start your stopwatch. See also Charlie Stross’s ‘High Frontier’ megathread.)

you’re undervaluing the new connections that are possible which parchment can’t compete with like this post I’m making to you. … in many communities – especially rural third world communities, offline sources are just as unreachable, if not more, than online sources

but (#2) you’re just looking at the negatives and ignoring the positives, and that’s wrong! Look at the positives, never mind the negatives! (Also very common out on the Web 2.0 frontier.)

I definitely agree that there is a ton of great knowledge stored up in books and other offline sources, but people solve problems with the information they have

but (#3) …hey, we get by, don’t we? Does it really matter all that much?

I’m not a fan of Richard Rorty, but I believe that communities have conversations, and that knowledge lives in those conversations (even if some of them are very slow conversations that have been serialised to paper over the decades). I also believe that knowledge comes in domains, and that each domain follows the shape of the overall cloud of knowledge constituted by a conversation. But I’ve been in enough specialised communities (Unix geeks, criminologists, folk singers, journalists…) to know that there’s a wall of ignorance and indifference around each domain; there probably has to be, if we’re not to keel over from too much perspective. Your stuff, you know about and you know that you don’t know all that much; you know you’re not an expert. Their stuff, well, you know enough; you know all you need to know, and anyway how complicated can it be?

Enthusiasts are good people to have around; they hoard the knowledge and keep the conversation going, even when there’s a bit of a lull. The trouble is, they tend to keep the wall of ignorance and apathy in place while they’re doing it. The moral is, if your question is about something just outside a particular domain of knowledge, don’t ask an enthusiast – they’ll tell you there’s nothing there. (Or: there’s something there now, but it won’t be there for long. Or: there’s something there, but look at all the great stuff we’ve got here!)

So much that hides

Alex points to this piece by Rashmi Sinha on ‘Findability with tags’: the vexed question of using tags to find the material that you’ve tagged, rather than as an elaborate way of building a mind-map.

I should stress, parenthetically, that that last bit wasn’t meant as a putdown – it actually describes my own use of Simpy. I regularly tag pages, but almost never use tags to actually retrieve them. Sometimes – quite rarely – I do pull up all the pages I’ve tagged with a generic “write something about this” tag. Apart from that, I only ever ask Simpy two questions: one is “what was that page I tagged the other day?” (for which, obviously, meaningful tags aren’t required); the other is “what does my tag cloud look like?”.

Now, you could say that the answer to the second question isn’t strictly speaking information; it’s certainly not information I use, unless you count the time I spend grooming the cloud by splitting, merging and deleting stray tags. I like tag clouds and don’t agree with Jeffrey Zeldman’s anathema, but I do agree with Alex that they’re not the last word in retrieving information from tags. Which is where Rashmi’s article comes in.

Rashmi identifies three ways of layering additional information on top of the basic item/tag pairing, all of which hinge on partitioning the tag universe in different ways. This is most obvious in the case of faceted tagging: here, the field of information is partitioned before any tags are applied. Rashmi cites the familiar example of wine, where a ‘region’ tag would carry a different kind of information from ‘grape variety’, ‘price’ or for that matter ‘taste’. Similar distinctions can be made in other areas: a news story tagged ‘New Labour’, ‘racism’ and ‘to blog about’ is implicitly carrying information in the domains ‘subject (political philosophy)’, ‘subject (social issue)’ and ‘action to take’.

There are two related problems here. A unique tag, in this model, can only exist within one dimension: if I want separate tags for New Labour (the people) and New Labour (the philosophy), I’ll either have to make an artificial distinction between the two (New_Labour vs New_Labour_philosophy) or add a dimension layer to my tags (political_party.New_Labour vs political_philosophy.New_Labour). Both solutions are pretty horrible. More broadly, you can’t invoke a taxonomist’s standby like the wine example without setting folksonomic backs up, and with some reason: part of the appeal of tagging is precisely that you start with a blank sheet and let the domains of knowledge emerge as they may.

Clustered tagging (a new one on me) addresses both of these problems, as well as answering the much-evaded question of how those domains are supposed to emerge. A tag cluster – as seen on Flickr – consists of a group of tags which consistently appear together, suggesting an implicit ‘domain’. Crucially, a single tag can occur in multiple clusters. The clusters for the Flickr ‘election’ tag, for example, are easy to interpret:

vote, politics, kerry, bush, voting, ballot, poster, cameraphone, democrat, president

wahl, germany, deutschland, berlin, cdu, spd, bundestagswahl

canada, ndp, liberal, toronto, jacklayton, federalelection

and, rather anticlimactically,

england, uk

Clustering, I’d argue, represents a pretty good stab at building emergent domains. The downside is that it only becomes possible when there are huge numbers of tagging operations.

The third enhancement to tagging Rashmi describes is the use of tags as pivots:

When everything (tag, username, number of people who have bookmarked an item) is a link, you can use any of those links to look around you. You can change direction at any moment.

Lurking behind this, I think, is Thomas‘s original tripartite definition of ‘folksonomy’:

the three needed data points in a folksonomy tool [are]: 1) the person tagging; 2) the object being tagged as its own entity; and 3) the tag being used on that object. Flattening the three layers in a tool in any way makes that tool far less valuable for finding information. But keeping the three data elements you can use two of the elements to find a third element, which has value. If you know the object (in del.icio.us it is the web page being tagged) and the tag you can find other individuals who use the same tag on that object, which may lead (if a little more investigation) to somebody who has the same interest and vocabulary as you do. That person can become a filter for items on which they use that tag.

This, I think, is pivoting in action: from the object and its tags, to the person tagging and the tags they use, to the person using particular tags and the objects they tag. (There’s a more concrete description here.)

Alex suggests that using tags as pivots could also be considered a subset of faceted browsing. I’d go further, and suggest that facets, clusters and pivots are all subsets of a larger set of solutions, which we can call domain-based tagging. If you use facets, the domains are imposed: this approach is a good fit to relatively closed domains of knowledge and finite groups of taggers. If you’ve got an epistemological blank sheet and a limitless supply of taggers, you can allow the domains to emerge: this is where clusters come into their own. And if what you’re primarily interested in is people – and, specifically, who‘s saying what about what – then you don’t want multiple content-based domains but only the information which derives directly from human activity: the objects and their taggers. Or rather, you want the objects and the taggers, plus the ability to pivot into a kind of multi-dimensional space: instead of tags existing within domains, each tag is a domain in its own right, and what you can find within each tag-domain is the objects and their taggers.

What all of this suggests is that, unsurprisingly, there is no ‘one size fits all’ solution. I suggested some time ago that

If ‘cloudiness’ is a universal condition, del.icio.us and Flickr and tag clouds and so forth don’t enable us to do anything new; what they are giving us is a live demonstration of how the social mind works.

All knowledge is cloudy; all knowledge is constructed through conversation; conversation is a way of dealing with cloudiness and building usable clouds; social software lets us see knowledge clouds form in real time. I think that’s fine as far as it goes; what it doesn’t say is that, as well as having conversations about different things, we’re having different kinds of conversations and dealing with the cloud of knowing in different ways. Ontology is not, necessarily, overrated; neither is folksonomy.

The users geeks don’t see

Nick writes, provocatively as ever, about the recent ‘community-oriented’ redesign of the netscape.com portal:

A few days ago, Netscape turned its traditional portal home page into a knockoff of the popular geek news site Digg. Like Digg, Netscape is now a “news aggregator” that allows users to vote on which stories they think are interesting or important. The votes determine the stories’ placement on the home page. Netscape’s hope, it seems, is to bring Digg’s hip Web 2.0 model of social media into the mainstream. There’s just one problem. Normal people seem to think the entire concept is ludicrous.

Nick cites a post titled Netscape Community Backlash, from which this line leapt out at me:

while a lot of us geeks and 2.0 types are addicted to our own technology (and our own voices, to be honest), it’s pretty darn obvious that A LOT of people want to stick with the status quo

This reminded me of a minor revelation I had the other day, when I was looking for the Java-based OWL reasoner ‘pellet’. I googled for
pellet owl
- just like that, no quotes – expecting to find a ‘pellet’ link at the bottom of forty or fifty hits related to, well, owls and their pellets. In fact, the top hit was “Pellet OWL Reasoner”. (To be fair, if you google
owl pellet
you do get the fifty pages of owl pellets first.)

I think it’s fair to say that the pellet OWL reasoner isn’t big news even in the Web-using software development community; I’d be surprised if everyone reading this post even knows what an OWL reasoner is (or has any reason to care). But there’s enough activity on the Web around pellet to push it, in certain circumstances, to the top of the Google rankings (see for yourself).

Hence the revelation: it’s still a geek Web. Or rather, there’s still a geek Web, and it’s still making a lot of the running. When I first started using the Internet, about ten years ago, there was a geek Web, a hobbyist Web, an academic Web (small), a corporate Web (very small) and a commercial Web (minute) – and the geek Web was by far the most active. Since then the first four sectors have grown incrementally, but the commercial Web has exploded, along with a new sixth sector – the Web-for-everyone of AOL and MSN and MySpace and LiveJournal (and blogs), whose users vastly outnumber those of the other five. But the geek Web is still where a lot of the new interesting stuff is being created, posted, discussed and judged to be interesting and new.

Add social software to the mix – starting, naturally, within the geek Web, as that’s where it came from – and what do you get? You get a myth which diverges radically from the reality. The myth is that this is where the Web-for-everyone comes into its own, where millions of users of what was built as a broadcast Web with walled-garden interactive features start talking back to the broadcasters and breaking out of their walled gardens. The reality is that the voices of the geeks are heard even more loudly – and even more disproportionately – than before. Have a look at the ‘popular’ tags on del.icio.us: as I write, six of the top ten (including all of the top five) relate directly to programmers, and only to programmers. (Number eight reads: “LinuxBIOS – aims to replace the normal BIOS found on PCs, Alphas, and other machines with a Linux kernel”. The unglossed reference to Alphas says it all.) Of the other four, one’s a political video, two are photosets and one is a full-screen animation of a cartoon cat dancing, rendered entirely in ASCII art. (Make that seven of the top ten.)

I’m not a sceptic about social software: ranking, tagging, search-term-aggregation and the other tools of what I persist in calling ethnoclassification are both new and powerful. But they’re most powerful within a delimited domain: a user coming to del.icio.us for the first time should be looking for the ‘faceted search’ option straight away (“OK, so that’s the geek cloud, how do I get it to show me the cloud for European history/ceramics/Big Brother?”) The fact that there is no ‘faceted search’ option is closely related, I’d argue, to the fact that there is no discernible tag cloud for European history or ceramics or Big Brother: we’re all in the geek Web. (Even Nick Carr.) (Photography is an interesting exception – although even there the only tags popular enough to make the del.icio.us tag cloud are ‘photography’, ‘photo’ and ‘photos’. There are 40 programming-related tags, from ajax to xml.)

Social software wasn’t built for the users of the Web-for-everyone. Reaction to the Netscape redesign tells us (or reminds us) that there’s no reason to assume they’ll embrace it.

Update Have a look at Eszter Hargittai‘s survey of Web usage among 1,300 American college students, conducted in February and March 2006. MySpace is huge, and Facebook’s even huger, but Web 2.0 as we know it? It’s not there. 1.9% use Flickr; 1.6% use Digg; 0.7% use del.icio.us. Answering a slightly different question, 1.5% have ever visited Boingboing, and 1% Technorati. By contrast, 62% have visited CNN.com and 21% bbc.co.uk. It’s still, very largely, a broadcast Web with walled-garden interactivity. Comparing results like these with the prophecies of tagging replacing hierarchy, Long Tail production and mashups all round, I feel like invoking the story of the blind men and the elephant – except that I’m not even sure we’ve all got the same elephant.

I couldn’t make it any simpler

I hate to say this – I’ve always loathed VR boosters and been highly sceptical about the people they boost – but Jaron Lanier’s a bright bloke. His essay Digital Maoism doesn’t quite live up to the title, but it’s well worth reading (thanks, Thomas).

I don’t think he quite gets to the heart of the current ‘wisdom of the crowds’ myth, though. It’s not Maoism so much as Revivalism: there’s a tight feedback loop between membership of the collective, collective activity and (crucially) celebration of the activity of the collective. Or: celebration of process rather than end-result – because the process incarnates the collective.

Put it this way. Say that (for example) the Wikipedia page on the Red Brigades is wildly wrong or wildly inadequate (which is just as bad); say that the tag cloud for an authoritative Red Brigades resource is dominated by misleading tags (‘kgb’, ‘ussr’, ‘mitrokhin’…). Would a wikipedian or a ‘folksonomy’ advocate see this situation as a major problem? Not being either I can’t give an authoritative answer, but I strongly suspect the answer would be No: it’s all part of the process, it’s all part of the collective self-expression of wikipedians and the growth of the folksonomy, and if the subject experts don’t like it they should just get their feet wet and start tagging and editing themselves. And if, in practice, the experts don’t join in – perhaps, in the case of Wikipedia, because they don’t have the stomach for the kind of ‘editing’ process which saw Jaron Lanier’s own corrections get reverted? Again, I don’t know for sure, but I suspect the answer would be another shrug: the wiki’s open to all – and tagspace couldn’t be more open – so who’s to blame, if you can’t make your voice heard, but you? There’s nothing inherently wrong with the process, except that you’re not helping to improve it. There’s nothing inherently wrong with the collective, except that you haven’t joined it yet.

Two quotes to clarify (hopefully) the connection between collective and process. Michael Wexler:

our understanding of things changes and so do the terms we use to describe them. How do I solve that in this open system? Do I have to go back and change all my tags? What about other people’s tags? Do I have to keep in mind all the variations on tags that reflect people’s different understanding of the topics?The social connected model implies that the connections are the important part, so that all you need is one tag, one key, to flow from place to place and discover all you need to know. But the only people who appear to have time to do that are folks like Clay Shirky. The rest of us need to have information sorted and organized since we actually have better things to do than re-digest it.

What tagging does is attempt to recreate the flow of discovery. That’s fine… but what taxonomy does is recreate the structure of knowledge that you’ve already discovered. Sometimes, I like flowing around and stumbling on things. And sometimes, that’s a real pita. More often than not, the tag approach involves lots of stumbling around and sidetracks.

It’s like Family Feud [a.k.a. Family Fortunes - PJE]. You have to think not of what you might say to a question, you have to guess what the survey of US citizens might say in answer to a question. And that’s really a distraction if you are trying to just answer the damn question.

And our man Lanier:

there’s a demonstrative ritual often presented to incoming students at business schools. In one version of the ritual, a large jar of jellybeans is placed in the front of a classroom. Each student guesses how many beans there are. While the guesses vary widely, the average is usually accurate to an uncanny degree.This is an example of the special kind of intelligence offered by a collective. It is that peculiar trait that has been celebrated as the “Wisdom of Crowds,”

The phenomenon is real, and immensely useful. But it is not infinitely useful. The collective can be stupid, too. Witness tulip crazes and stock bubbles. Hysteria over fictitious satanic cult child abductions. Y2K mania. The reason the collective can be valuable is precisely that its peaks of intelligence and stupidity are not the same as the ones usually displayed by individuals. Both kinds of intelligence are essential.

What makes a market work, for instance, is the marriage of collective and individual intelligence. A marketplace can’t exist only on the basis of having prices determined by competition. It also needs entrepreneurs to come up with the products that are competing in the first place. In other words, clever individuals, the heroes of the marketplace, ask the questions which are answered by collective behavior. They put the jellybeans in the jar.

To illustrate this, once more (just the once) with the Italian terrorists. There are tens of thousands of people, at a conservative estimate, who have read enough about the Red Brigades to write that Wikipedia entry: there are a lot of ill-informed or partially-informed or tendentious books about terrorism out there, and some of them sell by the bucketload. There are probably only a few hundred people who have read Gian Carlo Caselli and Donatella della Porta’s long article “The History of the Red Brigades: Organizational structures and Strategies of Action (1970-82)” – and I doubt there are twenty who know the source materials as well as the authors do. (I’m one of the first group, obviously, but certainly not the second.) Once the work’s been done anyone can discover it, but discovery isn’t knowledge: the knowledge is in the words on the pages, and ultimately in the individuals who wrote them. They put the jellybeans in the jar.

This is why (an academic writes) the academy matters, and why academic elitism is – or at least can be – both valid and useful. Jaron:

The balancing of influence between people and collectives is the heart of the design of democracies, scientific communities, and many other long-standing projects. There’s a lot of experience out there to work with. A few of these old ideas provide interesting new ways to approach the question of how to best use the hive mind.Scientific communities … achieve quality through a cooperative process that includes checks and balances, and ultimately rests on a foundation of goodwill and “blind” elitism — blind in the sense that ideally anyone can gain entry, but only on the basis of a meritocracy. The tenure system and many other aspects of the academy are designed to support the idea that individual scholars matter, not just the process or the collective.

I’d go further, if anything. Academic conversations may present the appearance of a collective, but it’s a collective where individual contributions are preserved and celebrated (“Building on Smith’s celebrated critique of Jones, I would suggest that Smith’s own analysis is vulnerable to the criticisms advanced by Evans in another context…”). That is, academic discourse looks like a conversation – which wikis certainly can do, although Wikipedia emphatically doesn’t.

The problem isn’t the technology, in other words: both wikis and tagging could be ways of making conversation visible, which inevitably means visualising debate and disagreement. The problem is the drive to efface any possibility of conflict, effectively repressing the appearance of debate in the interest of presenting an evolving consensus. (Or, I could say, the problem is the tendency of people to bow and pray to the neon god they’ve made, but that would be a bit over the top – and besides, Simon and Garfunkel quotes are far too obvious.)

Update 13th June

I wrote (above): It’s not Maoism so much as Revivalism: there’s a tight feedback loop between membership of the collective, collective activity and (crucially) celebration of the activity of the collective. Or: celebration of process rather than end-result – because the process incarnates the collective.

Here’s Cory Doctorow, responding to Lanier:

Wikipedia isn’t great because it’s like the Britannica. The Britannica is great at being authoritative, edited, expensive, and monolithic. Wikipedia is great at being free, brawling, universal, and instantaneous.If you suffice yourself with the actual Wikipedia entries, they can be a little papery, sure. But that’s like reading a mailing-list by examining nothing but the headers. Wikipedia entries are nothing but the emergent effect of all the angry thrashing going on below the surface. No, if you want to really navigate the truth via Wikipedia, you have to dig into those “history” and “discuss” pages hanging off of every entry. That’s where the real action is, the tidily organized palimpsest of the flamewar that lurks beneath any definition of “truth.” The Britannica tells you what dead white men agreed upon, Wikipedia tells you what live Internet users are fighting over.

The Britannica truth is an illusion, anyway. There’s more than one approach to any issue, and being able to see multiple versions of them, organized with argument and counter-argument, will do a better job of equipping you to figure out which truth suits you best.

Quoting myself again, There’s nothing inherently wrong with the process, except that you’re not helping to improve it. There’s nothing inherently wrong with the collective, except that you haven’t joined it yet.

When there is no outside

Nick Carr’s hyperbolically-titled The Death of Wikipedia has received a couple of endorsements and some fairly vigorous disagreement, unsurprisingly. I think it’s as much a question of tone as anything else. When Nick reads the line

certain pages with a history of vandalism and other problems may be semi-protected on a pre-emptive, continuous basis.

it clearly sets alarm bells ringing for him, as indeed it does for me (“Ideals always expire in clotted, bureaucratic prose”, Nick comments). Several of his commenters, on the other hand, sincerely fail to see what the big deal might be: it’s only a handful of pages, it’s only semi-protection, it’s not that onerous, it’s part of the continuing development of Wikipedia editing policies, Wikipedia never claimed to be a totally open wiki, there’s no such thing as a totally open wiki anyway…

I think the reactions are as instructive as the original post. No, what Nick’s pointing to isn’t really a qualitative change, let alone the death of anything. But yes, it’s a genuine problem, and a genuine embarrassment to anyone who takes the Wikipedian rhetoric seriously. Wikipedia (“the free encyclopedia that anyone can edit”) routinely gets hailed for its openness and its authority, only not both at the same time – indeed, maximising one can always be used to justify limits on the other. As here. But there’s another level to this discussion, which is to do with Wikipedia’s resolution of the openness/authority balancing-act. What happens in practice is that the contributions of active Wikipedians take precedence over both random vandals and passing experts. In effect, both openness and authority are vested in the group.

In some areas this works well enough, but in others it’s a huge problem. I use Wikipedia myself, and occasionally drop in an edit if I see something that’s crying out for correction. Sometimes, though, I see a Wikipedia article that’s just wrong from top to bottom – or rather, an article where verifiable facts and sustainable assertions alternate with errors and misconceptions, or are set in an overall argument which is based on bad assumptions. In short, sometimes I see a Wikipedia article which doesn’t need the odd correction, it needs to be pulled and rewritten. I’m not alone in having this experience: here’s Tom Coates on ‘penis envy’ and Thomas Vander Wal (!) on ‘folksonomy’, as well as me on ‘anomie’.

It’s not just a problem with philosophical concepts, either – I had a similar reaction more recently to the Wikipedia page on the Red Brigades. On the basis of the reading I did for my doctorate, I could rewrite that page from start to finish, leaving in place only a few proper names and one or two of the dates. But writing this kind of thing is hard and time-consuming work – and I’ve got quite enough of that to do already. So it doesn’t get done.

I don’t think this is an insurmountable problem. A while ago I floated a cunning plan for fixing pages like this, using PledgeBank to mobilise external reserves of peer-pressure; it might work, and if only somebody else would actually get it rolling I might even sign up. But I do think it’s a problem, and one that’s inherent to the Wikipedia model.

To reiterate, both openness and authority are vested in the group. Openness: sure, Wikipedia is as open to me as any other registered editor d00d, but in practice the openness of Wikipedia is graduated according to the amount of time you can afford to spend on it. As for authority, I’m not one, but (like Debord) I have read several good books – better books, to be blunt, than those relied on by the author[s] of the current Red Brigades article. But what would that matter unless I was prepared to defend what I wrote against bulk edits by people who disagreed – such as, for example, the author[s] of the current article? On the other hand, if I was prepared to stick it out through the edit wars, what would it matter whether I knew my stuff or not? This isn’t just random bleating. When I first saw that Red Brigades article I couldn’t resist one edit, deleting the completely spurious assertion that the group Prima Linea was a Red Brigades offshoot. When I looked at the page again the next day, my edit had been reverted.

Ultimately Wikipedia isn’t about either openness or authority: it’s about the collective activity of editing Wikipedia and being a Wikipedian. From that, all else follows.

Update 2/6/06 (in response to David, in comments)

There are two obvious problems with the Wikipedia page on the Brigate Rosse, and one that’s larger but more diffuse. The first problem is that it’s written in the present tense; it’s extremely dubious that there’s any continuity between the historic Brigate Rosse and the gang who shot Biagi, let alone that they’re simply, unproblematically the same group. This alone calls for a major rewrite. Secondly, the article is written very much from a police/security-service/conspiracist stance, with a focus on question like whether the BR was assisted by the Czech security services or penetrated by NATO. But this tends to reinforce an image of the BR as a weird alien force which popped up out of nowhere, rather than an extreme but consistent expression of broader social movements (all of which has been documented).

The broader problem – which relates to both of the specific points – goes back to a problem with the amateur-encyclopedia format itself: Wikipedia implicitly asks what a given topic is, which prompts contributors to think of their topic as having a core, essential meaning (I wrote about this last year). The same problem can arise in a ‘proper’ encyclopedia, but there it’s generally mitigated by expertise: somebody who’s spent several years studying the broad Italian armed struggle scene is going to be motivated to relate the BR back to that scene, rather than presenting it as an utterly separate thing. The motivation will be still greater if the expert on the BR has also been asked to contribute articles on Prima Linea, the NAP, etc. This, again, is something that happens (and works, for all concerned) in the kind of restricted conversations that characterise academia, but isn’t incentivised by the Wikipedia conversation – because the Wikipedia conversation doesn’t go anywhere else. Doing Wikipedia is all about doing Wikipedia.

Who’s there?

At Many-to-Many, Ross Mayfield reports that Clay Shirky and danah boyd have been thinking about “the lingering questions in our field”, viz. the field of social software. I was a bit surprised to see that

How can communities support veterans going off topic together and newcomers seeking topical information and connections?

still qualifies as a ‘lingering question’; I distinctly remember being involved in thrashing this one out, together with Clay, the best part of nine years ago. But this was the one that really caught my eye, if you’ll pardon the expression:

What level of visual representation of the body is necessary to trigger mirror neurons?

Uh-oh. Sherry Turkle (subscription-only link):

a woman in a nursing home outside Boston is sad. Her son has broken off his relationship with her. Her nursing home is taking part in a study I am conducting on robotics for the elderly. I am recording the woman’s reactions as she sits with the robot Paro, a seal-like creature advertised as the first ‘therapeutic robot’ for its ostensibly positive effects on the ill, the elderly and the emotionally troubled. Paro is able to make eye contact by sensing the direction a human voice is coming from; it is sensitive to touch, and has ‘states of mind’ that are affected by how it is treated – for example, it can sense whether it is being stroked gently or more aggressively. In this session with Paro, the woman, depressed because of her son’s abandonment, comes to believe that the robot is depressed as well. She turns to Paro, strokes him and says: ‘Yes, you’re sad, aren’t you. It’s tough out there. Yes, it’s hard.’ And then she pets the robot once again, attempting to provide it with comfort. And in so doing, she tries to comfort herself.What are we to make of this transaction? When I talk to others about it, their first associations are usually with their pets and the comfort they provide. I don’t know whether a pet could feel or smell or intuit some understanding of what it might mean to be with an old woman whose son has chosen not to see her anymore. But I do know that Paro understood nothing. The woman’s sense of being understood was based on the ability of computational objects like Paro – ‘relational artefacts’, I call them – to convince their users that they are in a relationship by pushing certain ‘Darwinian’ buttons (making eye contact, for example) that cause people to respond as though they were in relationship.

Further reading: see Kathy Sierra on mirror neurons and the contagion of negativity. See also Shelley‘s critique of Kathy’s argument, and of attempts to enforce ‘positive’ feelings by manipulating mood. And see the sidebar at Many-to-Many, which currently reads as follows:

Recent Commentsviagra on Sanger on Seigenthaler’s criticism of Wikipedia

hydrocodone cheap on Sanger on Seigenthaler’s criticism of Wikipedia

viagra on Sanger on Seigenthaler’s criticism of Wikipedia

alprazolam online on Sanger on Seigenthaler’s criticism of Wikipedia

Timur on Sanger on Seigenthaler’s criticism of Wikipedia

Timur on Sanger on Seigenthaler’s criticism of Wikipedia

Recent Trackbacks

roulette: roulette

jouer casino: jouer casino

casinos on line: casinos on line

roulette en ligne: roulette en ligne

jeux casino: jeux casino

casinos on line: casinos on line

Cloudbuilding (3)

By way of background to this post – and because I think it’s quite interesting in itself – here’s a short paper I gave last year at this conference (great company, shame about the catering). It was co-written with my colleagues Judith Aldridge and Karen Clarke. I don’t stand by everything in it – as I’ve got deeper into the project I’ve moved further away from Clay’s scepticism and closer towards people like Carole Goble and Keith Cole – but I think it still sets out an argument worth having.

Mind the gap: Metadata in e-social science

1. Towards the final turtle

It’s said that Bertrand Russell once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the centre of our galaxy. At the end of the lecture, a little old lady at the back of the room got up and said: “What you have told us is rubbish. The world is really a flat plate supported on the back of a giant tortoise.”

Russell smiled and replied, “What is the tortoise standing on?”

“You’re very clever, young man, very clever,” said the old lady. “But it’s turtles all the way down.”

The Russell story is emblematic of the logical fallacy of infinite regress: proposing an explanation which is just as much in need of explanation as the original fact being explained. The solution, for philosophers (and astronomers), is to find a foundation on which the entire argument can be built: a body of known facts, or a set of acceptable assumptions, from which the argument can follow.

But what if infinite regress is a problem for people who want to build systems as well as arguments? What if we find we’re dealing with a tower of turtles, not when we’re working backwards to a foundation, but when we’re working forwards to a solution?

WSDL [Web Services Description Language] lets a provider describe a service in XML [Extensible Markup Language]. [...] to get a particular provider’s WSDL document, you must know where to find them. Enter another layer in the stack, Universal Description, Discovery, and Integration (UDDI), which is meant to aggregate WSDL documents. But UDDI does nothing more than register existing capabilities [...] there is no guarantee that an entity looking for a Web Service will be able to specify its needs clearly enough that its inquiry will match the descriptions in the UDDI database. Even the UDDI layer does not ensure that the two parties are in sync. Shared context has to come from somewhere, it can’t simply be defined into existence. [...] This attempt to define the problem at successively higher layers is doomed to fail because it’s turtles all the way up: there will always be another layer above whatever can be described, a layer which contains the ambiguity of two-party communication that can never be entirely defined away. No matter how carefully a language is described, the range of askable questions and offerable answers make it impossible to create an ontology that’s at once rich enough to express even a large subset of possible interests while also being restricted enough to ensure interoperability between any two arbitrary parties.
(Clay Shirky)

Clay Shirky is a longstanding critic of the Semantic Web project, an initiative which aims to extend Web technology to encompass machine-readable semantic content. The ultimate goal is the codification of meaning, to the point where understanding can be automated. In commercial terms, this suggests software agents capable of conducting a transaction with all the flexibility of a human being. In terms of research, it offers the prospect of a search engine which understands the searches it is asked to run and is capable of pulling in further relevant material unprompted.

This type of development is fundamental to e-social science: a set of initiatives aiming to enable social scientists to access large and widely-distributed databases using ‘grid computing’ techniques.

A Computational Grid performs the illusion of a single virtual computer, created and maintained dynamically in the absence of predetermined service agreements or centralised control. A Data Grid performs the illusion of a single virtual database. Hence, a Knowledge Grid should perform the illusion of a single virtual knowledge base to better enable computers and people to work in cooperation.
(Keith Cole et al)

Is Shirky’s final turtle a valid critique of the visions of the Semantic Web and the Knowledge Grid? Alternatively, is the final turtle really a Babel fish — an instantaneous universal translator — and hence (excuse the mixed metaphors) a straw person: is Shirky setting the bar impossibly high, posing goals which no ‘semantic’ project could ever achieve? To answer these questions, it’s worth reviewing the promise of automated semantic processing, and setting this in the broader context of programming and rule-governed behaviour.

2. Words and rules

We can identify five levels of rule-governed behaviour. In rule-driven behaviour, firstly, ‘everything that is not compulsory is forbidden’: the only actions which can be taken are those dictated by a rule. In practice, this means that instructions must be framed in precise and non-contradictory terms, with thresholds and limits explicitly laid down to cover all situations which can be anticipated. This is the type of behaviour represented by conventional task-oriented computer programming.

A higher level of autonomy is given by rule-bound behaviour: rules must be followed, but there is some latitude in how they are applied. A set of discrete and potentially contradictory rules is applied to whatever situation is encountered. Higher-order rules or instructions are used to determine the relative priority of different rules and resolve any contradiction.

Rule-modifying behaviour builds on this level of autonomy, by making it possible to ‘learn’ how and when different rules should be applied. In practice, this means that priority between different rules is decided using relative weightings rather than absolute definitions, and that these weightings can be modified over time, depending on the quality of the results obtained. Neither rule-bound nor rule-modifying behaviour poses any fundamental problems in terms of automation.

Rule-discovering behaviour, in addition, allows the existing body of rules to be extended in the light of previously unknown regularities which are encountered in practice (“it turns out that many Xs are also Y; when looking for Xs, it is appropriate to extend the search to include Ys”). This level of autonomy — combining rule observance with reflexive feedback — is fairly difficult to envisage in the context of artificial intelligence, but not impossible.

The level of autonomy assumed by human agents, however, is still higher, consisting of rule-interpreting behaviour. Rule-discovery allows us to develop an internalised body of rules which corresponds ever more closely to the shape of the data surrounding us. Rule-interpreting behaviour, however, enables us to continually and provisionally reshape that body of rules, highlighting or downgrading particular rules according to the demands of different situations. This is the type of behaviour which tells us whether a ban is worth challenging, whether a sales pitch is to be taken literally, whether a supplier is worth doing business with, whether a survey’s results are likely to be useful to us. This, in short, is the level of Shirky’s situational “shared context” — and of the final turtle.

We believe that there is a genuine semantic gap between the visions of Semantic Web advocates and the most basic applications of rule-interpreting human intelligence. Situational information is always local, experiential and contingent; consequently, the data of the social sciences require interpretation as well as measurement. Any purely technical solution to the problem of matching one body of social data to another is liable to suppress or exclude much of the information which makes it valuable.

We cannot endorse comments from e-social science advocates such as this:

variable A and variable B might both be tagged as indicating the sex of the respondent where sex of the respondent is a well defined concept in a separate classification. If Grid-hosted datasets were to be tagged according to an agreed classification of social science concepts this would make the identification of comparable resources extremely easy.
(Keith Cole et al)

Or this:

work has been undertaken to assert the meaning of Web resources in a common data model (RDF) using consensually agreed ontologies expressed in a common language [...] Efforts have concentrated on the languages and software infrastructure needed for the metadata and ontologies, and these technologies are ready to be adopted.
(Carole Goble and David de Roure; emphasis added)

Statements like these suggest that semantics are being treated as a technical or administrative matter, rather than a problem in its own right; in short, that meaning is being treated as an add-on.

3. Google with Craig

To clarify these reservations, let’s look at a ‘semantic’ success story.

The service, called “Craigslist-GoogleMaps combo site” by its creator, Paul Rademacher, marries the innovative Google Maps interface with the classifieds of Craigslist to produce what is an amazing look into the properties available for rent or purchase in a given area. [...] This is the future….this is exactly the type of thing that the Semantic Web promised
(Joshua Porter)

‘This’ is is an application which calculates the location of properties advertised on the ‘Craigslist’ site and then displays them on a map generated from Google Maps. In other words, it takes two sources of public-domain information and matches them up, automatically and reliably.

That’s certainly intelligent. But it’s also highly specialised, and there are reasons to be sceptical about how far this approach can be generalised. On one hand, the geographical base of the application obviates the issue of granularity. Granularity is the question of the ‘level’ at which an observation is taken: a town, an age cohort, a household, a family, an individual? a longitudinal study, a series of observations, a single survey? These issues are less problematic in a geographical context: in geography, nobody asks what the meaning of ‘is’ is. A parliamentary constituency; a census enumeration district; a health authority area; the distribution area of a free newspaper; a parliamentary constituency (1832 boundaries) — these are different ways of defining space, but they are all reducible to a collection of identifiable physical locations. Matching one to another, as in the CONVERTGRID application (Keith Cole et al) — or mapping any one onto a uniform geographical representation — is a finite and rule-bound task. At this level, geography is a physical rather than a social science.

The issue of trust is also potentially problematic. The Craigslist element of the Rademacher application brings the social element to bear, but does so in a way which minimises the risks of error (unintentional or intentional). There is a twofold verification mechanism at work. On one hand, advertisers — particularly content-heavy advertisers, like those who use the ‘classifieds’ and Craigslist — are motivated to provide a (reasonably) accurate description of what they are offering, and to use terms which match the terms used by would be buyers. On the other hand, offering living space over Craigslist is not like offering video games over eBay: Craigslist users are not likely to rely on the accuracy of listings, but will subject them to in-person verification. In many disciplines, there is no possibility of this kind of ‘real-world’ verification; nor is there necessarily any motivation for a writer to use researchers’ vocabularies, or conform to their standards of accuracy.

In practice, the issues of granularity and trust both pose problems for social science researchers using multiple data sources, as concepts, classifications and units differ between datasets. This is not just an accident that could have been prevented with more careful planning; it is inherent in the nature of social science concepts, which are often inextricably contingent on social practice and cannot unproblematically be recorded as ‘facts’. The broad range covered by a concept like ‘anti-social behaviour’ means that coming up with a single definition would be highly problematic — and would ultimately be counter-productive, as in practice the concept would continue to be used to cover a broad range. On the other hand, concepts such as ‘anti-social behaviour’ cannot simply be discarded, as they are clearly produced within real — and continuing — social practices.

The meaning of a concept like this — and consequently the meaning of a fact such as the recorded incidence of anti-social behaviour — cannot be established by rule-bound or even rule-discovering behaviour. The challenge is to record both social ‘facts’ and the circumstances of their production, tracing recorded data back to its underlying topic area; to the claims and interactions which produced the data; and to the associations and exclusions which were effectively written into it.

4. Even better than the real thing

As an approach to this problem, we propose a repository of content-oriented metadata on social science datasets. The repository will encompass two distinct types of classification. Firstly, those used within the sources themselves; following Barney Glaser, we refer to these as ‘In Vivo Concepts’. Secondly, those brought to the data by researchers (including ourselves); we refer to these as ‘Organising Concepts’. The repository will include:

• relationships between Organising Concepts
‘theft from the person’ is a type of ‘theft’

• associations between In-Vivo Concepts and data sources
the classification of ‘Mugging’ appears in ‘British Crime Survey 2003’

• relationships between In-Vivo Concepts
‘Snatch theft’ is a subtype of the classification of ‘Mugging’

• relationships between Organising Concepts and In-Vivo Concepts
the classification of ‘Snatch theft’ corresponds to the concept of ‘theft from the person’

The combination of these relationships will make it possible to represent, within a database structure, a statement such as

Sources of information on Theft from the person include editions of the British Crime Survey between 1996 and the present; headings under which it is recorded in this source include Snatch theft, which is a subtype of Mugging

The structure of the proposed repository has three significant features. Firstly, while the relationships between concepts are hierarchical, they are also multiple. In English law, the crime of Robbery implies assault (if there is no physical contact, the crime is recorded as Theft). The In-Vivo Concept of Robbery would therefore correspond both to the Organising Concept of Theft from the person and that of Personal violence. Since different sources may share categories but classify them differently, multiple relationships between In-Vivo Concepts will also be supported. Secondly, relationships between concepts will be meaningful: it will be possible to record that two concepts are associated as synonyms or antonyms, for example, as well as recording one as a sub-type of the other. Thirdly, the repository will not be delivered as an immutable finished product, but as an open and extensible framework. We shall investigate ways to enable qualified users to modify both the developed hierarchy of Organising Concepts and the relationships between these and In-Vivo Concepts.

In the context of the earlier discussion of semantic processing and rule-governed behaviour, this repository will demonstrate the ubiquity of rule-interpreting behaviour in the social world by exposing and ‘freezing’ the data which it produces. In other words, the repository will encode shifting patterns of correspondence, equivalence, negation and exclusion, demonstrating how the apparently rule-bound process of constructing meaning is continually determined by ‘shared context’.

The repository will thus expose and map the ways in which social data is structured by patterns of situational information. The extensible and modifiable structure of the repository will facilitate further work along these lines: the further development of the repository will itself be an example of rule-interpreting behaviour. The repository will not — and cannot — provide a seamless technological bridge over the semantic gap; it can and will facilitate the work of bridging the gap, but without substituting for the role of applied human intelligence.

We are bored in the city

Et la piscine de la rue des Fillettes. Et le commissariat de police de la rue du Rendez-Vous. La clinique médico-chirurgicale et le bureau de placement gratuit du quai des Orfèvres. Les fleurs artificielles de la rue du Soleil. L’hôtel des Caves du Château, le bar de l’Océan et le café du Va et Vient. L’hôtel de l’Epoque.

Et l’étrange statue du Docteur Philippe Pinel, bienfaiteur des aliénés, dans les derniers soirs de l’été. Explorer Paris.

The early situationists, following Chtcheglov‘s lead, turned urban wandering into a form of political/psychological exploration, a group encounter with the city mediated only by alcohol. At a less exalted level, I’ve long been fascinated by the kind of odd urban poetry evoked here, in Manchester as much as Paris, and by the changing articulation of city space: established cities are a slow-motion example of Marx’s dictum about how we make our lives within conditions we have inherited. So it’s easy to see how well this could work:

Socialight lets you put virtual “sticky” notes called StickyShadows anywhere in the real world. Share pictures, notes and more using your cell phone.

But – for all that the site says about restricting access to Groups and Contacts – it’s also easy to see how very badly it could work.

* I leave a note for all my friends at the mall to let them know where I’m hanging out. All my friends in the area see it.
* A woman shows all her close friends the tree under which she had her first kiss.
* An entire neighborhood gets together and documents all the unwanted litter they find in an effort to share ownership of a community problem.
* A food-lover uses Socialight to share her thoughts on the amazing vanilla milkshakes at a new shop.
* The neighborhood historian creates her own walking tour for others to follow.
* A group of friends create their own scavenger hunt.
* A tourist takes place-based notes about stores in a shopping district, only for himself, for a time when he returns to the same city.
* A small business places StickyShadows that its customers would be interested in finding.
* A band promotes an upcoming show by leaving a StickyShadow outside the venue.

It was all going so well (although I did wonder why that entire neighbourhood couldn’t just pick up the litter) right up to the last two. Advertising – yep, that’s just what we all want more of in our urban lives. Lots of nice intrusive advertising.

Anne:

The worst thing about taking-for-granted that our experiences with the city and each other will be “enriched” by more data, by more information, by making the invisible visible, etc., is that we never have to account for or be accountable to how.

More specifically, there’s a huge difference between enabling conversation and enabling people to be informed – in other words, between talking-with and being-talked-at. Social software is all about conversation – about enabling people to talk together. Moreover, any conversation is defined as much by what it shuts out as what it includes; it’s hard to listen to the people you want to talk with when you’re being talked at. Even setting aside the information-overload potential of all those overlapping groups (do I need to know where so-and-so had her first kiss? do I need to know now?), it’s clear that Socialight is trying to serve two ends which are not only incompatible but opposed – and only one of which pays money. Which is probably why, even though the technology is still in beta, I already feel that using it constructively would be going against the grain.

Cloudbuilding (2)

Here’s a problem I ran into, halfway through building my first ontology, and some thoughts on what the solution might be.

Question 47 of the Mixmag survey reads:

Have you ever had an instance[sic] where your drug use caused you to:
Get arrested?
Lose a job?
Fail an exam?
Crash a car/bike?
Be kicked out of a club?

What this tells us is that one of the things the Mixmag questionnaire is ‘about’ – one of the in vivo concepts (or groups of in vivo concepts) that we need to record – is misadventures consequent on drug use. The question is how we define this concept logically – and this isn’t just an abstract question, as the way that we define it will affect how people can access the information. There are three main possibilities.

1. Model the world
We could say that to have a job is to be a party to a contract of employment, which is a type of agreement between two parties, which is agreed on a set occasion and covers a set timespan. Hence to lose a job is to cease to be a party to a previously-agreed contract of employment; this may occur as a consequence of drug use (defined, in the Mixmag context, as the use of a psychoactive substance other than alcohol and tobacco).

This is all highly logical and would make it explicit that the Mixmag data contains some information on terminations of contracts of employment (as well as on drug-related stuff). However, the Mixmag survey isn’t actually about contracts of employment, and doesn’t mandate the definitional assumptions I made above. So this isn’t really legitimate. (It would also be incredibly laborious, particularly when we turn our attention away from the relatively succinct Mixmag survey and look at more typical social survey data: surveys of physical capacity, for example, routinely ask people whether they can (a) walk to the shops (b) walk to the Post Office (c) walk to the nearest bus stop, and so on down to (j) or (k). All, in theory, capable of being modelled logically – but perhaps only in theory.)

2. Stick to the theme
Alternatively, we could begin by taking a view as to the key concepts which a data source is about – in this case, psychoactive consumption, feelings about psychoactive consumption, consequences of psychoactive consumption, and sexual behaviour – and draw the line at anything beyond those concepts. On this assumption the fact that the survey covers misadventures consequent on drug use would be within scope, but the list of misadventures given above wouldn’t be: that’s part of the data that researchers will find when they look at the data source itself, not part of the conceptual ‘catalogue’ that we’re building. The advantage of this is that it’s conceptually very ‘clean’ and makes it that much clearer what a source is about; the disadvantage is obviously that it cuts off some ways in to the data and hides some information.

3. Include black boxes
What I’ve got at the moment – following the principle of using the definitions supplied by the source – is an ontology in which some concepts are defined and others are undefined (black boxes). For instance, I’ve got a concept of Job loss, but all that OWL ‘knows’ about it is that it’s a type of Misadventure (which may be consequent on drug use) – which is in turn a type of Life event, (which is a type of event that happens to one person). This would allow anyone searching for events consequent on drug use to get to job loss as a type of misadventure, but wouldn’t let them get to drug-related misadventure from job loss – unless they happened to enter the exact name of the ‘job loss’ concept. I’m coming to believe that this is unsatisfactory: we should define the model in terms of what a data source is about. This means that we’ve got to either take a narrow, domain-specific view or take the view that each source gives us one piece of a much larger picture – in which case we’re inevitably committed to modelling the world. But the ‘black box’ option isn’t really sustainable.

Cloudbuilding (1)

This one’s about work.

I’m currently documenting the concepts underlying the 2005 Mixmag Drug Survey using Protege. Here’s why:

The documentation of social science datasets on a conceptual level, so as to make multiple datasets comprehensible within a shared conceptual framework, is inherently problematic: the concepts on which the data of the social sciences are constructed are imprecise, contested and mutable, with key concepts defined differently by different sources. When a major survey release is published, for example, the accompanying metadata often includes not only a definition of key terms, but discussion of how and why the definitions have changed since the previous release. This information is of crucial importance to the social scientist, both as a framework for understanding statistical data and as a body of social data in its own right.

It follows that we cannot think in terms of ironing out inconsistencies between social science datasets and resolving ambiguities. Rather, documenting the datasets must include documenting the definitions of the conceptual framework on which the datasets are built, however imprecise or inappropriate these concepts might appear in retrospect. This will also involve preserving – and exposing – the variations between different sources, or successive releases from a single source.

There are currently two main approaches to conceptually-oriented data documentation. A ‘top down’ approach is exemplified by the European Language Social Sciences Thesaurus (ELSST). The Madiera portal allows researchers to explore ELSST and access European survey data which has been linked to ELSST keywords. The limitations of the top-down approach can be gauged from ELSST’s concepts relating to drug use. Drug Abuse, Drug Addiction, Illegal Drugs and Drug Effects are all ‘leaf’ concepts – headings which have no subheadings under them. However, they are in different parts of the overall ELSST tree: for example, Drug Abuse is under Social Problems->Abuse, while Drug Effects is under Biology->Pharmacology. Although the hierarchy is augmented by a list of ‘related’ concepts, to some extent facilitating horizontal as well as vertical navigation, the hierarchy inevitably makes some types of enquiry easier than others. Anyone using the ELSST ‘tree’ will be visually reminded of the affinities identified by ELSST’s authors between Pharmacology and Physiology, or between Drug Abuse and Child Abuse. These problems follow from the initial design choice of a single conceptual hierarchy.

This approach to classification has recently come under criticism. Advocates of ‘bottom-up’ approaches argue that top-down taxonomies like the Dewey Decimal System or ELSST are an artificial imposition on the world of knowledge, which is better represented as a set of individual acts of labelling or ‘tagging’. It is argued that the ‘trees’ of hierarchical taxonomies can be replaced with a pile of ‘leaves’.

One successful ‘bottom-up’ approach is the framework for documenting survey data developed by the Data Documentation Initiative (DDI). The DDI standard makes it possible to search on keywords associated with surveys, sections of surveys and individual questions; the short text of individual questions is also searchable. Searches of DDI metadata can also be run from the Madiera portal: a search on ‘marijuana’, for instance, brings back short text items including the following:

CONSUMED HASHISH,MARIJUANA
- Health Behaviour in School-Aged Children (Switzerland, 1990)

Smoking cannabis should be legal? Q2.31
- Scottish Social Attitudes Survey (Scotland, 2001)

Q92C DRUGS EV B OFFERED – MARIJUANA
- Eurobarometer 37.0 (EU-wide, 1992)

Clearly, this way in to the data makes it easy for a well-prepared researcher to track the use of particular concepts ‘in the wild’ (in vivo concepts). However, this gain comes at the cost of some information. There is wide variation both in the terminology used in the surveys and in the concepts to which they refer. In one survey smoking cannabis might be a type of petty crime; in others it might figure as a type of leisure activity or a potential health risk. These conceptual differences are reflected in the vocabulary used by data sources – and by researchers. Depending on context, three researchers using ‘marijuana’, ‘hashish’ and ‘cannabis’ as search terms may be asking for the same data or for three different sets of data.

Neither the ‘top-down’ nor the ‘bottom-up’ approach articulates the conceptual assumptions which underlie the construction of a dataset – assumptions expressed both in the definition of in vivo concepts and in relationships between them. Rather than leaving much of this conceptual information undocumented (the DDI approach) or encoding one ‘correct’ set of assumptions while excluding or sidelining others (the ELSST approach), we propose to offer a coherent hierarchy of in vivo concepts for each individual source, based on the definitions (explicit and implicit) used in each source. Comparing the in vivo conceptual hierarchies used in multiple datasets will enable researchers both to see where concepts are directly comparable and to see where – and how – their definitions diverge and overlap.

To document hierarchies of in vivo concepts, we shall use description logic and the Semantic Web language OWL-DL (Web Ontology Language – Description Logic). OWL-DL makes it possible to formulate a precise logical specification of concepts such as

- use of cannabis (either marijuana or hashish) in the month prior to the survey
- use of either Valium or temazepam, at any time
- seizures of Class A drugs by HM Customs in the financial year 2004/5

At least, that’s the idea. Now wait for part 2…

Nor mine, now

I nearly installed Hyperwords this morning; the only reason I didn’t is that I haven’t moved to Firefox 1.5 yet (and don’t intend to until I’m confident it won’t break any of the extensions I’m already using). And, in principle, it looks great:

With the Hyperwords Firefox Extension installed just select any text and a menu appears. You can search major search engines, look things up in reference sites, check dictionary definitions, translate, email quickly and much more.

So why does the thought of actually using it give me the creeps? Alex is similarly ambivalent:

In principle, it’s a handy tool. But I would have to overcome a few personal adoption barriers before I started using it on a regular basis. As a consumer, I can see the appeal of opening up texts to interact with the rest of the Web; but as a writer, I instinctively bristle at the idea of giving up that kind of control. I suspect that disposition colors the way I read things on the Web; I like my documents to feel fixed, not fluid. And the Web feels squishy enough as it is. That, and somehow the premise of cracking open someone else’s document with a toolbox of Web services feels like a kind of violation. This is undoubtedly my own personal neurotic hangup.

Well, if it is, it’s mine too. Mark Bernstein gets some of it:

In the very early days of hypertext research, people worried a lot about hand-crafted links. “How will we ever afford to put in all those links?” We also worried about how we’d ever manage to afford to digitize stuff for the Web, not to mention paying people to create original Web pages. Overnight, we discovered that we’d got the sign wrong: people would pay for the privilege of making Web sites. The problem isn’t the ‘tyranny’ of the links, and replacing it with the tyranny of the link server might not be a great solution.

and

Authors don’t offer navigation options to be “useful”; thoughtful writers use links to express ideas. Argumentation seeks understanding, not merely access.

Let’s put some of that together: cracking open someone else’s document with a toolbox of Web services; the tyranny of the link server; thoughtful writers use links to express ideas. In other words, Hyperwords doesn’t extend existing hyperlink practice but undermines it. In the Hyperwords world you’ll no longer read a document, you’ll mine it for information – or rather, mine it for jumping-off points for retrieving information from authoritative sources. (Or retrieving whatever other stuff you may want to retrieve.)

Alex mentioned Xanadu, but I don’t think Hyperwords is a step in that direction. If anything, it’s a step backwards. (One of Xanadu’s key words is “author-based”.) Hyperlinks and the Web of dialogic, socially-produced content go together just fine; as Mark says, mass amateurism is already providing an answer to the question of where all those links are going to come from. It’s messy and incomplete, but it’s here – and it’s, well, ours (as a writer, I instinctively bristle at the idea of giving up that kind of control). You can see two visions of the Web here: the mass amateurisation of writing as against the ‘consumer’-oriented, authority-led, broadcast Web. Hyperwords ostensibly enhances horizontal, transverse linkage, but its effect would be to pull the Web further towards broadcast mode – albeit an ‘empowered’, roll-your-own broadcast mode.

Can’t keep quiet for long – I’m a human being!
Can’t help singing this song – I’m a human being!
You won’t listen to me,
I’m not an authority…

- Steve Mason, “Eclipse”

A mean idea to call my own

Technorati’s new “Filter by Authority” feature depresses me intensely – not least because I thought they’d abandoned the word ‘authority’ some time after my last rant on the subject. There are three problems here. Firstly, as I wrote last year:

Technorati is all about in-groups and out-groups. … authority directly tracks popularity – although this is ‘popularity’ in that odd American high-school sense of the word: ‘popular’ sites aren’t the ones with the most friends (most out-bound links, most distinct participants in Comments threads or even most traffic) but the ones with the most people envying them (hence: most in-bound links).

In other words, ‘authority’ is a really lousy synonym for ‘high inbound link count’, raising completely groundless expectations of quality and reliability. McDonald’s is a popular provider of hot food; it’s not an authority on cooking. The relative popularity (or enviability) of a site may signify many things, but it doesn’t signify that the site possesses absolute qualities like veracity, completeness, beauty – or authority.

But hold on – is it absurd to call McDonald’s authoritative? You’ve got to admit, they’re good at what they do… There’s a sense in which this is a tautology – because what they do is maximise the numbers who come through the doors – but never mind. Let’s say that we can identify the McDonald’s branch with the highest number of burgers sold (or repeat customers, or stars on uniforms – the precise metric doesn’t matter). There’s a good argument for using the word ‘best’: it looks like this is the best McDonald’s branch in the world. And the best fast food joint in the world? Well, maybe. The best restaurant in the world? Um, no. Quality tracks popularity, to some extent, but only within a given domain – otherwise USA Today would be the best newspaper in the USA . (To say it’s the best national mass-market tabloid would be less controversial.) [Edited with thanks to commenters who know about this stuff.]

This is the second problem with authority-as-link-count, and one which Technorati shows no sign of recognising, much less addressing. I can live with the idea that the Huffington Post is more popular than Beppe Grillo’s blog – but more authoritative? I really don’t think so. (Any right-wingers reading this may substitute Huffington for Grillo and Kos for Huffington, and re-read. And rest.) At bottom, Technorati’s ‘authority’ ranking is based on the laughably outdated idea that there is a single Blogosphere, within which we’re all talking to pretty much the same people about pretty much the same things. Abandon that assumption and the problems with an ‘authority’ metric are staringly apparent: who am I authoritative for? who am I more authoritative than?

But if this is an error it’s not an error of Dave Sifry’s invention. As I’ve said, within any given domain of ideas, it’s not entirely meaningless to say that authority tracks popularity: among academic authors, the author who sells books and fills halls is likely to be the author who is cited, even if he or she hasn’t written anything particularly inspired since Thatcher was in power. The question is whether this is a feature or a bug: if we’re going to read one writer rather than another, should we choose the popular dullard or the unknown genius? Put it another way: if we’re choosing who to read in the context of a new publication medium with massively lowered entry costs – and with an accompanying ideology rich in levelled playing-fields, smashed barriers and dismantled hierarchies – who should we be trying to seek out: Dullard (Popular) or Genius (Unknown)?

The third and most fundamental problem with ranking by ‘authority’ is that it brings to the Web one of the very features of offline life which Web evangelists told us we were leaving behind. This kind of ‘feature’ – and the buzz-chasing worldview that promotes it – is part of the problem, not part of the solution.

I find that it often helps me to also answer the question, “Who is the most influential blogger talking about XXX this week, and what did she say?”Dave Sifry

It’s just work

Suw Charman types too fast. She’s produced what looks like a fascinating record of the Future of Web Apps conference, but I can’t see myself ever reading the whole thing. But this jumped out at me (slight edits):

Joshua Schachter – The things we’ve learned
Tagging is not really about classification or organisation, it’s user interface. It’s a way to store your working state or context. Useful for recall. OK for discovery because someone might tag similarly to you. Bad for distribution.Not all metadata is tags. People ask for automatic metadata, but that’s not the value – the value is attention, that you saw it and decided that it was important enough to tag. Auto-tagging doesn’t help you do what you’re trying to do. … because there’s a small transaction cost that adds value. But don’t make them do too much work.

the value is attention … because there’s a small transaction cost, that adds value The value of tagging is in the meaning it encodes, and the meaning is created by people doing a bit of work. If you make things easy by automating the process of getting meaning out of data, that creativity is not called upon and what you get doesn’t have the same value.

This parallels my thoughts about the impoverishment of technology through the collapse of alternative ways of using it, often in the name of ease of use – not to mention the thoughts I put down on my other blog about how the best communication (and the best narrative) is gappy and open to multiple interpretations. One way of understanding why gappiness and plurivalence might be a positive virtue, finally, is suggested by Anne, who counterposes predictability and foretelling to potentiality and hope.

I think what all these arguments have in common is a sense of meaning as not-yet-(finally)-constructed. In this perspective the point of social software, in particular, is not to connect data but to enable people to talk about data – while preventing that talk from being entirely weightless by imposing a certain level of friction, a certain opportunity cost. (A cost which can always be raised or lowered. Thought experiment: Wikipedia makes it impossible to revert an article to a version less than a week old. What happens?) In the case of tagging systems, there has to be a reason why you would want to tag a resource, and want to tag it in ways that have meaning for you. Meaning is created through conversations that require a bit of effort, within the shared context of an open horizon: it’s work, but it’s work without a known outcome. A journey of hope, as someone wrote.

(My blogs are crossing over – I hate it when that happens…)

The shapes between us

Peter Campbell writes in the current LRB:

Inanimate things in museums – teacups from which no one drinks, pictures which will never again be bought and sold – can, as much as stuffed animals, make one think sadly of the time when they were alive. Modern curators know this and spend much time and money avoiding notions of dust, death and mummification. Even art museums do not cram everything in the reserve collection onto the walls. But in avoiding the confusion, heterogeneity and abundance of old-style museums like the Pitt Rivers in Oxford, some of what they shared with the street has gone: an ability to feed the imagination with unexplained, comical, sinister and melancholy juxtapositions, for example – the aspect of collecting the Surrealists exploited.

A well-designed and artistically curated set of exhibits, in other words, enables the viewer to experience the exhibition as a whole, rather than being constantly interrupted by lacrimae rerum for the lost use-value of each individual exhibit. However, in the exhibition that this form of curation creates – a single-minded, smoothly articulated conglomerate – more is lost than a melancholy evocation of the exhibits’ past life. This kind of exhibition turns the viewer into a passive spectator, receiving and absorbing an achieved whole rather than responding imaginatively to an assembly of disjointed parts.

This critique, it seems to me, is not that far from Adina‘s review of Walk the Line:

the unimaginative or condescending literalness of the movie is a good reminder of what I can’t stand about Hollywood style. It’s not hatred of emotion, or even melodrama. I loved Farewell My Concubine, which featured a damaged artist, unrequited love, drug addiction fueled by rejection, beautiful photography, and plenty of tragedy per foot of celluloid. The bits that the viewer needs to infer make all the difference.

Or, for that matter, Ellis’s argument here:

The first author opens up the thoughts of both his characters. Everything is controlled and explained. Meaning is processed for the reader. When the character speaks in German, she then helpfully provides an instant translation into English. The first author duly goes on to supply the reader with a sex scene.The second author seems about to supply a sex scene, then abruptly and unexpectedly denies that readerly expectation. Sketches displace sexual intercourse. Looking at sketches and making more sketches becomes more attractive than sex. What the woman thinks of this is withheld from the reader. We remain inside a single mind. There are no judgements made for us about the state of this mind. The reader has to process the writing and discover for herself where the meaning lies.

There is a difference in these two passages, I think, between writing (conventional, conformist, explanatory, offering the warmth of familiarity and shared values) and literature (incomplete, resonant, resisting familiarity and a single dimension of meaning).

(You’ll have to read the post to find out who the two writers are.)

The bits that the viewer needs to infer make all the difference. The meaning’s in the gaps – at least, that’s where you’re being treated as a thinking being, a participant in communication (which is always imperfect) and not a spectator of composed images.

And the high plains too

Tom comments on this post from last year:

Thoughts: (1) Pledgebank is about increasing the perceived effect of ones actions by connecting it to a larger purpose (2) Wikipedia already seems to have that mechanism but (3) I like the idea of building social processes alongside wikipedia a lot…

Yes and No to point 2. Wikipedia already has social reinforcement/reputation feedback effects built in, but they only really work once you’re on the inside. If you’re on the outside, the fabled dedication and energy of the Wikipedia community is actually a barrier – not least because, if you’re unlucky, all that dedication and energy will be applied to reversing your edits. (Think of Thomas Vander Wal‘s discovery that he disagreed radically with Wikipedia’s definition of ‘folksonomy’, and his subsequent struggle to get the definition changed – the point here being that Thomas actually coined the term, and not that long ago.)

This isn’t a new discovery: reputation-based regulation inevitably creates a barrier to entry, as anyone who’s tried to get noticed on Usenet can confirm. Reputation adds a bit of friction to the weightless process of making your mark online, and adds a bit of glue to the shapeless aggregate of people who do it; the fact that you have to build up a bit of reputation before your words gain traction is, mostly, feature rather than bug.

So is the Pledgebank idea reinventing the wheel, simply trying to use reputation-based peer pressure to mobilise a group who could have been subjecting themselves to Wikipedia peer pressure all along? I don’t think so. Compared with a Usenet newsgroup or a Web board community, Wikipedia has a couple of curious and atypical aspects. Firstly, the currency of Wikipedia reputation-building is work, and plenty of it. I’ve known people make a reputation on Usenet with a single post. The size and complexity of Wikipedia makes that highly unlikely. Secondly, Wikipedia is unusual in parallelling areas where people already have reputations, built up through domain-specific conversations. As always, issues of authority and reliability come into sharpest focus when the area’s one that you know personally. I can say that, if you’re interested in processes of consensus-formation in an area of hotly contested political debate, the Wikipedia page on the Lega Nord makes fascinating reading. If you’re interested in getting some reasonably authoritative views on the Lega Nord, it’s no substitute for reading the literature. This isn’t to say that Wikipedia is wrong – but it’s less right than it could be. And this is partly because Wikipedia’s informal reputation management mechanisms are orthogonal to the mechanisms which produce subject area experts, and partly because Wikipedia’s mechanisms operate to repel anyone who isn’t committed to building a Wikipedia reputation – perhaps because they’re more interested in building one within their subject area.

Hence the proposed Wikipedant posse. If – like me and Tom and Thomas – you’ve seen something on Wikipedia & thought That’s just wrong, but it would take a long time to fix it; and if you not only (a) know stuff, but (ii) know when you don’t know something and (3) know how to find stuff out; then this could be your kind of thing. The idea is simple: we compile a list of wrong-but-timeconsuming Wikipedia pages (usually involving simplistic or tendentious renderings of a subject); we dish them out, presumably at random; and, when we get assigned a page, we take ownership of it and try to put it right. This wouldn’t be a lifetime commitment, but it would almost certainly involve a couple of months of checking back and reverting unhelpful edits, on top of the researching and writing time.

I’ll be appealing to pedants, autodidacts and (OK, I admit it) academics rather than Wikipedia enthusiasts, and I’ll be appealing on a strictly time-limited basis rather than trying to create new Wikipedians. It will, unavoidably, involve quite a lot of work, which is why I’ll be calling in aid an external source of peer pressure in the form of Pledgebank.

And I’ll be doing this… some time soon. This year, definitely. (Terrors of the earth, I’m telling you.)

Update I wrote:

I’ll be appealing to pedants, autodidacts and (OK, I admit it) academics

and

Wikipedia’s mechanisms operate to repel anyone who isn’t committed to building a Wikipedia reputation – perhaps because they’re more interested in building one within their subject area.

Which perhaps isn’t precisely the impression I gave last September, when I wrote:

I’ll just reiterate that I’m not talking about people with expert knowledge, so much as perfectionists with inquiring minds.

What a difference a few months’ full-time employment makes. (I was a freelance journalist from 1999 to 2004, and kept it up on a part-time basis until last summer.) Let’s split the difference: subject experts will be welcome, just as long as they’re also perfectionists with inquiring minds. (Which of course they will be, what with being subject experts and everything.)

You may look like we do

David cites an empirical analysis of social network evolution in a large university community, based on a registry of e-mail interactions between more than 43,000 students, faculty, and staff. (“Hey, gang, let’s do the research right here!”)

The results show that at least in this particular environment, people were more likely to form ties with others when they had a shared “focus” such as a class that brought them together or a mutual acquaintance, but were less likely to interact solely on the basis of shared characteristics such as age or gender.

David headlines his post “Interests, not demographics”, but I don’t think the study is quite saying that. It’s true that demographics do not a network make – but then, I’ve known that ever since my mother first enjoined me to play with a complete stranger of my own age and sex while she talked to the kid’s mother, who wasn’t a complete stranger (to her).

But I don’t think the data’s there to conclude that ‘interests’ are key either, as much as I might like to. The reference to a shared “focus” such as a class that brought them together or a mutual acquaintance sounds more like history than interests. It may be a reasonable generalisation to say that enduring communities are interest-based – particularly if we include the granfalloonish limit case of communities which perpetuate themselves by making a shared interest of their own perpetuation. Conversations, though, just happen. A conversation starts for any number of reasons – not least because two people find each other simpatico/a – and once it’s started the participants generally want to carry it on. History, not interests.

From this it also follows that there are times when conversations just don’t happen, and all the shared interests in the world won’t make them happen. And, given that people who are having a conversation generally want it to continue, there are sometimes very few gaps in which a new conversation can get a foothold. Which brings us back to the granfalloons. Perhaps we can see some communities as large-scale conversations which have outlived any connection with interest, for many or most of the participants, but still persist – and, by persisting, prevent new and potentially interest-based conversations from arising.

(I can be a phenomenologist and a Marxist, can’t I?)

Home again

So, I’m a researcher. (At least until the money runs out next year; hopefully I’ll have something similar lined up by then.) Before I was a researcher I was a freelance journalist for about six years, while I did my doctorate; before that I was a full-time journalist for three years; and before that I worked in IT. Which is a whole other dark and backward abysm of time – I was a Unix sysadmin, and before that I was an Oracle DBA, and before that… database design, data analysis, Codasyl[1] database admin, a ghastly period running a PC support team, and before that systems analysis and if you go back far enough you get to programming, and frankly I still don’t trust any IT person who didn’t start in programming. (I’m getting better – at one time I didn’t trust anyone who didn’t start in programming.)

Now, there’s an odd kind of intellectual revelation which you sometimes get, when you’re a little way into a new field. It’s not so much a Eureka moment as a homecoming moment: you get it, but it feels as if you’re getting it because you knew it already. You feel that you understand what you’ve learnt so fully that you don’t need to think about it, and that everything that’s left to learn is going to follow on just as easily. Which usually turns out to be the case. The way it feels is that the structures you’re exploring are how your mind worked all along – or, perhaps, how your mind would have been working all along if you’d had these tools to play with. (Or: “It’s Unix! I know this!”)

I had that feeling a few times in my geek days – once back at the start, when I was loading BASIC programs off a cassette onto an Acorn Atom (why else would I have carried on?); once when I was introduced to Codasyl databases; and once (of course) when I met Unix, or rather when I understood piping and redirection. But the strongest homecoming moment was when, after being trained in data analysis, I saw a corporate information architecture chart (developed by my employer’s then parent company, with a bit of help from IBM). Data analysis hadn’t come naturally, but once I’d got it it was there – and, now that I had got it, just look what you could do with it! It was a sheet of A3 covered with lines and boxes, expressing propositions such as “a commercial transaction takes place between two parties, one of which is an organisational unit while the other may be an individual or an organisational unit”; propositions like that, but mostly rather more complex. I thought it was wonderful.

Fast forward again: database design, DBA, sysadmin, journalism, freelancing, PhD, research. Research which, for the last month or so, has involved using OWL (the ontology language formerly known as DAML+OIL) and the Protege logical modelling tool – which has enabled me to produce stuff like this.

It’s not finished – boy, is it not finished. But it is rather lovely. (Perhaps I just like lines and boxes…)

[1] If you don’t know what this means, don’t worry about it. (And if you do, Hi!)

This is the new stuff

Thomas criticises Wikipedia’s entry on folksonomy – a term which was coined just over a year ago by, er, Thomas. As of today’s date, the many hands of Wikipedia say:

Folksonomy is a neologism for a practice of collaborative categorization using freely chosen keywords. More colloquially, this refers to a group of people cooperating spontaneously to organize information into categories, typically using categories or tags on pages, or semantic links with types that evolve without much central control. … In contrast to formal classification methods, this phenomenon typically only arises in non-hierarchical communities, such as public websites, as opposed to multi-level teams and hierarchical organization. An example is the way in which wikis organize information into lists, which tend to evolve in their inclusion and exclusion criteria informally over time.

Thomas:

Today, having seen an new academic endeavor related to folksonomy quoting the Wikipedia entry on folksonomy, I realize the definition of Folksonomy has become completely unglued from anything I recognize (yes, I did create the word to define something that was undefined prior). It is not collaborative, it is not putting things into categories, it is not related to taxonomy (more like the antithesis of a taxonomy), etc. The Wikipedia definition seems to have morphed into something that the people with Web 2.0 tagging tools can claim as something that can describe their tool

I’m resisting the temptation to send Thomas the All-Purpose Wikipedia Snark Letter (“Yeah? Well, if you don’t like the wisdom of the crowds, Mr So-Called Authority…”). In fact, I’m resisting the temptation to say anything about Wikipedia; that’s another discussion. But I do want to say something about the original conception of ‘folksonomy’, and about how it’s drifted.

Firstly, another quote from Thomas’s post from today:

Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one’s own retrival. The tagging is done in a social environment (shared and open to others). The act of tagging is done by the person consuming the information.

There is tremendous value that can be derived from this personal tagging when viewing it as a collective, when you have the three needed data points in a folksonomy tool: 1) the person tagging; 2) the object being tagged as its own entity; and 3) the tag being used on that object. … [by] keeping the three data elements you can use two of the elements to find a third element, which has value. If you know the object (in del.icio.us it is the web page being tagged) and the tag you can find other individuals who use the same tag on that object, which may lead (if a little more investigation) to somebody who has the same interest and vocabulary as you do. That person can become a filter for items on which they use that tag. You then know an individual and a tag combination to follow.

This is admirably clear and specific; it also fits rather well with the arguments I was making in two posts earlier this year:

[perhaps] the natural state of knowledge is to be ‘cloudy’, because it’s produced within continuing interactions within groups: knowledge is an emergent property of conversation, you could say … [This suggests that] every community has its own knowledge-cloud – that the production and maintenance of a knowledge-cloud is one way that a community defines itself.

If ‘cloudiness’ is a universal condition, del.icio.us and flickr and tag clouds and so forth don’t enable us to do anything new; what they are giving us is a live demonstration of how the social mind works. Which could be interesting, to put it mildly.

Thomas’s original conception of ‘folksonomy’ is quite close to my conception of a ‘knowledge cloud’: they’re both about the emergence of knowledge within a social interaction (a conversation).

The current Wikipedia version of ‘folksonomy’ is both fuzzier and more closely tied to existing technology. What’s happened seems to be a kind of vicious circle of hype and expectations management. It’s not a new phenomenon – anyone who’s been watching IT for any length of time has seen it happen at least once. (Not to worry anyone, but it happened quite a lot around 1999, as I remember…)

  1. There’s Vision: someone sees genuinely exciting new possibilities in some new technology and writes a paper on – oh, I don’t know, noetic telepresence or virtual speleology or network prosody…
  2. Then there’s Development: someone builds something that does, well, a bit of it. Quite significant steps towards supporting network prosody. More coming in the next release.
  3. Phase three is Hype. Hype, hype, hype. Mm-hmm. I just can’t get enough hype, can you?
  4. The penultimate phase is Dissemination: in which everyone’s trying to support network prosody. Or, at least, some of it. That stuff that those other people did with their tool. There we go, fully network prosody enabled – must get someone to do a writeup.
  5. Finally we’re into Hype II, also known as Marketing: ‘network prosody’ is defined less by the original vision than by the tools which have been built to support it. The twist is that it’s still being hyped in exactly the same way – tools which don’t actually do that much are being marketed as if they realised the original Vision. It’s a bit of a pain, this stage. Fortunately it doesn’t last forever. (Stage 6 is the Hangover.)

What’s to be done? As I said back here, personally I don’t use the term ‘folksonomy’; I prefer Peter Merholz’s term ‘ethnoclassification’. Two of my objections to ‘folksonomy’ were that it appears to denote an end result as well as a process, and that it’s become a term of (anti-librarian) advocacy as well as description; Thomas’s criticisms of Wikipedia seem to point in a similar direction. Where I do differ from Thomas is in the emphasis to be placed on online technologies. Ethnoclassification is – at least, as I see it – something that happens everywhere all the time: it’s an aspect of living in a human community, not an aspect of using the Web. If I’m right about where we are in the Great Cycle of Hype, this may soon be another point in its favour.

Follow

Get every new post delivered to your Inbox.

Join 211 other followers

%d bloggers like this: