An Exercise in Irrelevance - The problem with institutional repositories

I don’t normally use my blog to engage in conversations the way that some people do. I already spend enough time on mailing lists, so using the blog seems redundant for this. However, I will change the habit of a life-time this once, because of an interesting discussion on institutional repositories, which I have previously written about myself.

To me the difficulty with institutional repositories is this. First, they are a resource. Then, some one says, this is good, everyone should do this. Then, someone else says, hey this is great, we could use this for our RAE (REF, whatever) return.

Now, you have to deposit things in your IR. But people object, on various “data is mine” grounds, so perhaps they make the IR non-public. The data model gets tweaked with various additional data (which school, who your line manager is) necessary for RAE. At the same time, your co-authors also have to deposit into their IR. And, if you move, you have to type your entire back catalogue into various repositories for your new institution.

Currently I am supposed to deposit papers in various IRs, including at University and school level. As well as add bibliographic information to various databases. And, then of course, project wiki’s. And the funders want the information in various databases. All of which is very time consuming, produces highly duplicated, and often error-prone data. In short, it’s a bad thing.

The irony is, if you google for any of my papers, the main source from which they are scraped is my website. I set this up myself many years ago now; it’s a simple bibtex to HTML thing (actually not so simple nowadays — it grew over time). So, the simplest and most straight-forward solution, also turns out to be the best. The most important thing is this; the bibtex files are the ones that I use, for my own work, for citing myself (which, like any good scientist I do as often as possible even when the citation is largely irrelevant). The website is what I use, when on the road to get the PDF of my own papers; if I want to give a reference to someone, I’ll email a link to my website. So, I keep it upto date, because it’s in my benefit to do so.

We need a few simple and easy to use standards for bibliographic data. It has to be simple, because it needs to fit in with peoples’ current work practices; this means it needs to be supported by a heterogenous environment, by many different tools. And it’s won’t be, if the standard is hard to develop against.

For data, of course, the issues are somewhat different. Mostly because data needs more structure than human-readable information, and because the data is often large. However, two issues remain: first, we still need to fit with peoples working practices; second, with data, engaging in the institutional football we see with bibliographic data, will still be a bad thing.

Again, simple data standards are what we need. After that, people will choose whatever they choose; the data standard will be enough to bring it all together in the best way that we can.