An Exercise in Irrelevance - Barriers to Reuse

I greatly enjoyed going to a workshop organised by Uniprot on data reuse. It was nice to see again some faces that I have not seen for a while; also nice to get a mention from Alex Bateman on a paper that Michael Bell and I wrote quite a few years ago know. I hope I managed to contribute something useful, although I am sure I started to get incoherent toward 8pm, as my stomach told me I needed dinner and my brain told me it needed sofa time.

For me, it was a nice chance to think about where we have come from and how far we have come. Bioinformatics is now such a knowledge rich domain and we have built this complex knowledge ecosystem with data distributed around the world, updated and freely accessible. The complexity of it all is probably in excess of almost any other discipline, although, of course, in terms of size we are dwarfed by many other areas.

Tying all of this together is a complex string of technologies from things like accession numbers, URLs, a whole variety of APIs, with XML, RDF thrown in their somewhere; and, of course, everywhere we go in biology, the inescapable reality that for all of the context and much of the actual data is only available in free text, hidden in papers. The system has grown over the years making it baroque, confusing and chaotic; but organised enough that you can do amazing things with it, a vast knowledge structure which underpins almost all of our attempts to understand biology these days.

The hope is to produce a white paper from the workshop, so I won’t repeat the discussion more than that, but I may also do some thinking about the technology that we use for reuse and the criteria that we might judge this by.