Semiotics, which is clearly older than the semantic web, tells us you can’t always map signs to real world objects. You can do it for things like, say, the Taj Mahal, but not for things like democracy, justice etc. So they map to concepts. Trouble is, you’re talking really about what’s inside someone else’s head. And you can’t really be sure what that is. So, the argument goes, stuff like RDF is just “syntactic sugar”. It’s neatly structured but can’t escape the fact that the tags, urns etc have to have an agreed meaning … I can’t bring myself to agree with this completely. In practice people seem to get by. I think there must be a feedback loop involved. If you interpret a statement about X and act on it, and your interpretation is wrong, and the interpretation matters in this case, something bad will probably happen. You will then revise your understanding of what is meant by X.
This is all good phenomenological stuff – see the Schutz quote above. One of Schutz’s great arguments was that there is no definitional God’s eye view – there is only human social experience, including the experience of making and using signs.
So surely the semantic web can work in small ways where all parties are agreed on the meaning of the vocabulary.
The trouble is – as Clay pointed out back here – that if you’ve got that level of agreement among all participants you don’t need the semantics. If you’re all using the same schema anyway, your respective schemas don’t need to describe themselves – and if they do need to describe themselves, there needs to be a common language they can do it in, and hence a higher level of shared context.
What you can do is say “I’m using [x] to mean $FOO, which is a subtype of $BAR but does not overlap with $BAZ; how about you?” Or rather, “On 2005-06-03, writing in Manchester (England/UK/EU), I used [x] to mean $FOO…” and so on. That, to me, is (or rather will be) where it gets interesting – the point is not to encode semiotics but to encode semantics in such a way that the semiotics can be inferred.
Or rather, in such a way that the semiotics can’t not be inferred. Which they need to be. Once you get away from the physical sciences and their geek spinoffs, it’s very, very hard to reach a final level of granularity. You can map the physical contours of France in exactly the same way that you can map Britain – and with enough data you could map Britain 100 years ago and map France 100 years ago in exactly the same way. What you can’t do is chart the number of suicides or street thefts or families in poverty or users of illegal drugs or asylum applications or hospital admissions in Britain and compare them with the figures for Britain 100 years ago, let alone with French figures. This is not because the data isn’t there, but (in all those cases) because it’s the product of a complex set of social interactions – and, as such, it doesn’t have a stable meaning, in time or in space.
This is what I mean about inferring semiotics: figures on ‘drug use’, to take the most obvious example, are produced in particular ways and classified using particular criteria, which correspond to patterns of public health and law enforcement activity as well as to broader social attitudes. The data doesn’t contain or express those attitudes and patterns of activity – but if you don’t know about them it’s effectively meaningless. (“Hey, look, there are twice as many people using drugs! Oh, wait, there are twice as many substances classified as drugs. Never mind.”) The only way forward, it seems to me, is to (as it were) factory-stamp data with the conditions of its production, as far as they can be established: “this source on ‘drugs’ covers this period in this jurisdiction, and consequently uses definitions derived from this legislation, including this but excluding this and this“.
That’s what I’d like to do, anyway.