Following the publication of a number of papers, Gary Merrill, Michel Dumontier and Robert Hoehndorf (also as PDF) and myself (also on PLoS One), there has been an enormous amount of discussion on what is realism in ontology building, and whether it appropriate for use in scientific ontology building. As I have documented previously, I had now left the BFO discuss mailing list, and more latter OBO discuss, as I felt that these discussions have reached a finishing point. In this post, I want to spell out clearly my reasons why I think that it is not appropriate. I want to try and avoid re-iterating the positions in my paper, and earlier postings, as well as provide a direct answer to David Sutherland who has posted why he is a realist.

What is realism?

Sadly, I need to start with a philosophical digression. At heart, I am not interested in philosophy, nor I guess are many in the bio-ontologies community. Those in this camp can safely skip this to the next section.

At heart, realism is a metaphysical interpretation of the ontology. How are we to interpret the relationship between, for example, the ontology term Human, and the things that exist in the real world. Realism asserts that the ontology term refers to a Universal, that exists in its own right, but not separately from the instances to which it refers.

Personally, I do not find these assertions of reality or truth very helpful. David Sutherland suggests that:

One possible reason is a failure of nerve. Many people become quite nervous at talk of truth and reality.

— David Sutherland

In my case, this is true, and it stems from my history. Like many people learning science, when I first heard of Mendels laws, or the exceptional weird behaviour of light, my initial response was that they were not real, just part of the mathematical model that describes the experimental results. Later on, though, I realised that I had the same worries about other concepts. When I was first told that a table holding a weight was asserting a force on the weight, I didn’t believe it; after all, when I support a weight it costs me effort to do so, but the table was just sitting there. Many years before this, I didn’t believe the idea that I was surrounded by invisible things could gasses, although I did realise that it was a good way of explaining the wind. Eventually, however, I became so used to manipulating force, or a gene in mathematical equations of physics or genetics, I just stopped worrying about it.

In his paper, Gary Merrill argues that, in practice, we don’t need a metaphysical interpretation anyway. I tend to agree. Consider this quote:

The next question was - what makes planets go around the sun? At the time of Kepler some people answered this problem by saying that there were angels behind them beating their wings and pushing the planets around an orbit. As you will see, the answer is not very far from the truth. The only difference is that the angels sit in a different direction and their wings push inward.

— Richard Feynman Character of Physical Law

Personally, I like to speak of models of data, rather than representations of reality. I find that talking of model reminds me that it is my job not to support models but to break them. I do not see the point of the rebadging of commonly used terms such as model, with more complex ones such as “representation of reality” (this rebadging is a theme of realism to which I will return). But the bottom line, though, it doesn’t really matter. The statement that latex**g ∝ 1/r2 is the same as latex**F_wingsofangels ∝ 1/r2. So long as we agree that the angels behave in a precise, predictable way, there is no deep reason to distinguish between the two, except for simple pragmatism: “gravity” is shorter and easier to say than “the wings of angels”.

What realism is not

Realism has chosen wisely in its choice of name. Most scientists believe in reality so, when faced with realism vs conceptualism, their gut feeling is that the former will be right. They believe in a mind-independent reality so, therefore, conceptualism must be wrong. Now others have argued convincingly that this is an inaccurate interpretation of conceptualism, so I will not repeat the discussion here, but instead look at a more specific interpretation, that realism means building ontologies on the basis of experimental evidence. This conflation of “evidence-based ontologies” with realism can be seen from David Sutherland.

The results of those inferences will be judged by how they match reality. An inference that is demonstrably false indicates a problem with the initial assertions (or with the inference mechanism).

— David Sutherland

Similarly, Judy Blake makes the same conflation

I strongly support the realist approach that facilitates the use of the ontology for science discovery. We represent in the ontologies what we know with some degree of certainty.

— Judy Blake

Of course, both of these positions are reasonable — we should judge ontologies by how well their inferences fit our experimental data and, further, for reference ontologies, we should represent knowledge for which we have very good evidence. But this is not realism. This can be shown with a straight-forward argument.

While the definition of “science” is open to question, a reasonable working definition would be that “Science is the interpretation of experimental data”. The idea that anyone who is not a realist, therefore, believes that we should not base ontologies on our experimental data, or what we know is either uncharitable or wrong. It also, however, undermines the notion that realism is a useful methodology. If science is about modelling experimental data, while realism is a methodology for building ontologies based on experimental data, then “realism-based scientific ontology” is tautological; “realism-based” adds nothing at all to the statement, except to make it longer. In short, returning to the earlier theme, we have rebadged “scientific ontology” as “realism-based ontology”.

Believing in reality does not make you a realist. Believing that ontologies should be based on evidence does not make you a realist; it just means you are a scientist.

What is a pragmatic implications of realism?

One of the difficulties in addressing the pragmatic implications of realism, is that many of the conclusions that are made do not seem to stem from the underlying philosophy. This makes it hard to judge what the implications of realism are in a given situation. The end result has to be to look at how realism has been practiced in, er, reality, ignoring the philosophical underpinning. I’ve taken this approach here.

The first time that I heard to realism was at Glasgow ISMB in 2004. One theme that came out here was the notion that all ontologies should be single inheritance; because in reality things can only be a kind of one other thing. BFO follow this, and this position was supporting in many discussions on BFO-discuss. Ironically, though, with terms such as “Object Part” any ontology that uses BFO is hard pressed to do likewise. I was a little surprised to be asked if I understood the strategy of asserting single inheritance and inferring the rest; surprised because a) this strategy is normalisation pattern from Alan Rector with whom I have worked for many years and b) because it represents a complete change. Normalisation results in a poly hierarchy — that some subsumption is inferred and some asserted is a engineering decision, not a question of underlying philosophy.

That realism is, apparently, capable of supporting such a shift is rather worrying. This cannot be put down to falsifiability — the notion that ontologies can be wrong and can change — as this is a change at the metaphysical level. It suggests that, in practice, realism is disconnected from its philosophical underpinning. It also suggests that realism is capable of justifying two quite differing positions — in short, it suggests that realism actually has very little explanatory power. Currently, the realist answer to this, is that the asserted relationships represent universals; but as there is no clear assay for what this means, I feel this doesn’t help. My own experience is that determining a privileged axis of inheritance is not, in most cases, possible. Ontologies are fundamentally multiply-inherited; normalisation is a simple engineering decision, which removes the load of maintaining this from the human to the reasoner.

Another long held tenet of realism is the assertion that the use of not represents bad ontology. Statements such as Fly all has_part not Wing are asserting a relationship with entities that don’t exist. However, many people find this sort of modelling useful. It, therefore, was a surprise to find that following a lot of careful thought that realism does allow Fly lacks Wing is okay. But winding not into the relationship in this way has a number of problems. First, it requires an alteration at the logical level of the ontology; the relationship has to be between the instance and universal purely to satisfy realism, because the universal really exists. In doing so, a special case instance-universal relationship is required. Secondly, the semantics of this relationship are now hidden, rather than being explicit in the ontological layer; the reader has to understand that some relationships are effectively positive, and some are negative. It’s unclear why it is necessary to jump over these hurdles, when it would have been far simpler to just use a not construction.

So, does realism produce good ontology. I have already spoken about this in my paper, but it seems fairly clear that it does not. BFO is mass-centric: waves, energy, force, entropy all have no place. It makes unnecessary and meaningless distinctions — site and spatial region (one is a region of space wrt to an observer, one is an absolute region of space). It also encompasses some outright howlers, including the fact that a spatial region cannot have a length.

Does realism encourage good practice? Again, I think in many cases, it does not. Firstly, it elevates “reality” above all else; so any distinction that can be made should be made, because that is reality right? Taken to the extreme this results in overly complex ontologies, suffering from analysis paralysis. Just because we can make a distinction does not mean that we should, unless there is a good use case, and a clear reason why this distinction adds usefulness to the ontology. It also results in the use of overly complex, philosophical language, which is hard for those outside a small clique to understand; I do, now, understand the definitions in BFO, but in many ways I wish that I didn’t. As a trivial example, the modification of the standard definition from (“A is a B that has R”) to enable the distinction between defined and primitive classes (“A =def B that has R”). This reduces the readability of definitions. Readability is important; we should be willing to compromise precision in its favour.

Likewise, I worry when I see definitions such as

Class: planned_process

SubClassOf:
       realizes some (is_concretization_of some ('plan specification'
            and has_part some 'objective specification'))

or even

Class: glucose_tolerance_test

SubClassOf:
         assay,
         has_specified_output some ('information content entity'
                              and is_proxy_for some 'insulin resistance')
         realizes some (is_concretization_of some 'independent variable
                      specification')
         realizes some (is_concretization_of some 'time series design')
         achieves_planned_objective some 'biological feature identification
                       objective'
         achieves_planned_objective some 'assay objective'
         has_part some ('data transformation'
                        and has_specified_input some 'measurement datum'
                        and has_specified_output some graph)
         has_part some ('administering substance in vivo'
                        and has_specified_input some glucose)

The distinctions being made here, and the properties realizes and is_concretization_of stem from realism, and more specifically from the generically dependent continuant. With its mass-centric bias, BFO 1.0 couldn’t represent many entities, such as information, a book or this blog post. So GDC was added. But a dependent continuant is a thing that exists dependent on another, that comes into existent with the other, and disappears again. GDC shares none of these characteristics. A book does not appear when it is first printed, nor does it disappear when the paper breaks down, or the ink fades. But it was not possible to add something like immaterial continuant, because it had to depend on some mass. The convoluted nature of the ontology here exists to satisfy the requirements of realism; not the ontologists, developers or users.

The alternative

So, what alternatives are there. I offer no alternative metaphysics, because, as described earlier, I neither care, nor do I feel a metaphysical interpretation is necessary. We are building ontologies in biomedicine for many reasons — but mostly they revolve around one thing — we need a structure to hold our knowledge, our theories and hypothesis which is computationally amenable because there are too many to do by hand. It’s an engineering task and this is what I care about.

Ontology building, I would argue, is a hybrid, sitting somewhere between software engineering and statistical modelling. We need to borrow from the best of these worlds, to produce a good engineering methodology.

Actually, we already have borrowed from software engineering; OBO, for example, advises mailing-lists, trackers, version control, releasing early and often, tight user feedback. All of these stem directly from the agile techniques that have come to the fore in the last decade; all of these have been part of ontology building since well before realism appeared on the scene.

I think we need to take more account of use cases, or their light-weight manifestations, with “user stories”. Realism, and the philosophical reflection that it inspires, to me seems to bear more in common with the waterfall methodologies of an earlier era; thinking carefully earlier to avoid having to fix things later sounds a good idea, but history suggests that in many cases, the thinking simply delays the point at which you discover you have to fix things anyway.

But agile software methodologies do not have all the answers; ontologies are not software. The key difference is that ontologies lack test frameworks. While it is sometimes possible to automatically test our ontology against the experimental data, in most cases it is not. I think this is where we need to borrow more from statistics. For instance, one rule that from statistical modelling is: do not add a new variable to a model, even if it increases the goodness of fit to the data, unless the increase is statistically significant. In ontological terms, this can be translated: just because you can make a distinction does not mean you should.

In his 2005 paper, Ingvar Johansan talks about the fallacy of mixing use and mention. As example he presents this (now changed) section of GO:

Gene_Ontology
  part_of
Biological process
  is_a
physiological process

The problem with this is that “biological process” is overloaded, referring both a biological process and the ontology term biological process. The link was originally put in place for engineering reasons; I used it, for instance, in the work for my own paper from 2002. I knew that semantic similarity (how closely annotated two genes are) correlates with sequence similarity; the question is does this work better, if we consider all of GO, or the three aspects independently. The answer is the latter; in short, Biological Process part_of Gene Ontology has no explanatory power. So, is this an example of realism demonstrating an ontological problem; sadly not. Consider this, slightly changed ontology:

Universe
  part_of
Biological process
  is_a
physiological process

According to realism, this simple rebadging of the top-level term has fixed the problem, because all biological processes really are part of the universe. But computationally, we have the same ontology, so we still have a term with no explanatory power. In short, the uses and the use cases of our ontology define the best ontology; the experimental data is only a start.

Conclusion

I tend to agree with Nicolas le Novere that this:

is an endless discussion because this is specifically the fundamental divergence between two schools of thoughts, both respectable, and both consistent, but irreconcilable.

— Nicolas Le Novere

I have written this post both as an answer to David Sutherland, as a supplement to my paper, but most importantly as a way to remove myself from the discussion. I think, now, that with three papers on the issue, I can move on with what I want to do: use ontologies to help with the analysis of our data, and to increase our understanding of biology.

I do not expect that the significant momentum that realism has built up will be broken, but I do hope that it will cease to be advanced as proven best practice, to be considered the only correct way forward. If this has been achieved, then it will help to avoid the unfortunate situation that some actually want; a fork in the community. I think that this is a pity; in general, I tend to prefer OBO’s stated principle that “we would strive for community acceptance […​] rather than encouraging rivalry”.

There are so many agreements between the various sides of this argument: it is on these, the practical, pragmatic engineering decisions that we see in much of OBO and GO, and that we see in the original ten principles of OBO that we should build.