Neuroimaging Databases, Arthur Toga
This is a live blog from Neuroinformatics 2009.
All of our observations about the brain are in some sense reductionist. We are looking at only thing at a time, and hope to infer knowledge from this. The knowledge is multi-technique — no single experiment is going to give the entire answer. Need to combine and integrate. Most of our data is descriptive — MRI is not that different from phrenology in one sense.
Process of dissemination — the web and equivalent — has been transformative of neurosciences. Large scale consortia are also important; has been involved in lots of these — sometimes painful — but useful. Good to learn the lessons from these.
The biggest lession from multisite brain mapping projects — the data needs to be open. If that data is open people will come, so long as it’s described.
Are new techniques coming along all the time; every near there is a new way of looking at stuff. Need to combine these forms of the data with knowledge from the past. There is a cost to this — digitizing and representing histology for instance, creates a lot of data. Currently can at 10 micrometre resolution on whole brain in terabytes of data.
One of the big issues is that, lots of the data is under patient confidentiality. Often can only store and check deidentified data. Are problems with metadata — some places have sent “phantom” images — which are used to callibrate the equipment, with a patient name on it. This sort of thing reduces the value. Need to check the data constantly.
Data Sharing and access control. Is a spectrum. Can release the data instantly it’s produced, six month after deposition, after publication, or never. Have a system to support this, with the acquirer having control over this.
Hardware — spend lots of money and eventually it will work. Have a 4PB system now, Uses a robotized tape system because spinning disks are too expensive.
Computer crashed out at this point, and I had to reboot, but he talked about Alzheimers. Gives a nice hypothesis that multi image databases could potentially answer.
With BIRN, data does not necessarily need to be centralised — it is possible to support distributed, but federated, databases. Have managed to aggregate and bring together information from many different resources. Databases need to have a suite of ancillary tools which we can use to look at the data.
Last example, ADNI — Alzheimers Diseaes, naturalistic study of AD progression. About 800 individuals with a variety of different techniques. Data is immediately released (same day often). Are about 90,000 images in the database; Downloads are highly periodic (not sure why!).
Data needs to be sufficiently well described, with integration across different datasets.
What works and what doesn’t. First, data — the data must be describable enough so that they can be understood. Second, the experiments need to be coordinated or they hard to integrate. Tools must be good. Needs to be a good focus. Size: the data needs to be big enough to have statistical power. Duration: databases must last, so must have enough funding. Mission: is it well enough defined. People: common purpose and leadership to carry forward. Sociology: do people agree what should be shared and when. Expertise: need this. Funding: need sustainability.