Here’s a problem I ran into, halfway through building my first ontology, and some thoughts on what the solution might be.
Question 47 of the Mixmag survey reads:
Have you ever had an instance[sic] where your drug use caused you to:
Lose a job?
Fail an exam?
Crash a car/bike?
Be kicked out of a club?
What this tells us is that one of the things the Mixmag questionnaire is ‘about’ – one of the in vivo concepts (or groups of in vivo concepts) that we need to record – is misadventures consequent on drug use. The question is how we define this concept logically – and this isn’t just an abstract question, as the way that we define it will affect how people can access the information. There are three main possibilities.
1. Model the world
We could say that to have a job is to be a party to a contract of employment, which is a type of agreement between two parties, which is agreed on a set occasion and covers a set timespan. Hence to lose a job is to cease to be a party to a previously-agreed contract of employment; this may occur as a consequence of drug use (defined, in the Mixmag context, as the use of a psychoactive substance other than alcohol and tobacco).
This is all highly logical and would make it explicit that the Mixmag data contains some information on terminations of contracts of employment (as well as on drug-related stuff). However, the Mixmag survey isn’t actually about contracts of employment, and doesn’t mandate the definitional assumptions I made above. So this isn’t really legitimate. (It would also be incredibly laborious, particularly when we turn our attention away from the relatively succinct Mixmag survey and look at more typical social survey data: surveys of physical capacity, for example, routinely ask people whether they can (a) walk to the shops (b) walk to the Post Office (c) walk to the nearest bus stop, and so on down to (j) or (k). All, in theory, capable of being modelled logically – but perhaps only in theory.)
2. Stick to the theme
Alternatively, we could begin by taking a view as to the key concepts which a data source is about – in this case, psychoactive consumption, feelings about psychoactive consumption, consequences of psychoactive consumption, and sexual behaviour – and draw the line at anything beyond those concepts. On this assumption the fact that the survey covers misadventures consequent on drug use would be within scope, but the list of misadventures given above wouldn’t be: that’s part of the data that researchers will find when they look at the data source itself, not part of the conceptual ‘catalogue’ that we’re building. The advantage of this is that it’s conceptually very ‘clean’ and makes it that much clearer what a source is about; the disadvantage is obviously that it cuts off some ways in to the data and hides some information.
3. Include black boxes
What I’ve got at the moment – following the principle of using the definitions supplied by the source – is an ontology in which some concepts are defined and others are undefined (black boxes). For instance, I’ve got a concept of Job loss, but all that OWL ‘knows’ about it is that it’s a type of Misadventure (which may be consequent on drug use) – which is in turn a type of Life event, (which is a type of event that happens to one person). This would allow anyone searching for events consequent on drug use to get to job loss as a type of misadventure, but wouldn’t let them get to drug-related misadventure from job loss – unless they happened to enter the exact name of the ‘job loss’ concept. I’m coming to believe that this is unsatisfactory: we should define the model in terms of what a data source is about. This means that we’ve got to either take a narrow, domain-specific view or take the view that each source gives us one piece of a much larger picture – in which case we’re inevitably committed to modelling the world. But the ‘black box’ option isn’t really sustainable.