As some of you know, I contribute to the Seattle edition of Worldchanging. That has resulted in a lot of research into science for sustainable development. Since Just Science week starts tomorrow, I thought it would be good to get the ball rolling with something that combines sustainability with science.
A couple of weeks ago I came upon a paper entitled The new bioinformatics: Integrating ecological data from the gene to the biosphere by a group of researchers from UC Santa Barbara and UC Davis. The abstract was interesting enough to go get the paper and find out what the authors really had to say about the subject.
Bioinformatics has been engine that has made the entire genomics “revolution” possible. From the various databases hosted by the NCBI and the EBI, to tools like BLAST, and efforts like the HapMap project, bioinformatics is an essential part of modern biology. The huge amounts of data from genome sequencing efforts provide the fuel for follow on bioinformatics efforts. Ecology is a field rich with information. Based on the paper, one could argue that it is a form of systems biology. Every ecological system is finally balance, and being able to understand the factors involved and how perturbing them might impact the ecology is not a trivial problem. In the review, the authors talk about the demise of gorillas in Africa. There were a number of factors involved, including Ebola and hunting pressure. To understand the impact of all the factors would require understanding the epidemiology, genetics and transmission modes of Ebola, the nutritional status and various sociological factors of the local human population, and the population dynamics of the gorilla population. I am sure there are many other factors involved. The kind of data here remind me of the kinds of challenges facing the field of biomedical informatics, which seeks to combine classic bioinformatics with healthcare and clinical information. In fact, the challenges are probably far greater, since the data are not as well understood, and the uncertainties are significantly more. Reading the paper, one gets a much better understanding of the challenges that the field faces. What makes the entire subject so fascinating in the end is the fact that ecological information is only really useful if it can be used predictively.
Right of the bat the heterogeneity and nature of ecological data would present the informatician with significant challenges, and that’s just in data management. Ecological information is also temporal, often over long time lines, which adds further complexity to the available information. However being able to mine diverse studies is critical to the success of ecological studies and hypothesis generation. Multiple studies over different temporal points not only lead to better results, but re-using the data in combination with different studies at the later date can help ecologists gain better insight. For the uninitiated, i.e. yours truly, this screams for some data standards at the minimum and the development of an ecological ontology in a perfect world.
Currently ecological data is spreadsheet based, i.e. it is still document centric. A number of ecologists also use packages such as R and SAS, since most ecological hypotheses are generated via statistical modeling. Most people will tell you that this is a recipe for data disaster. There is a need for quality databases, and a data centric approach. Regardless of integrative analysis or synthetic analysis, having data in well-designed databases will only help ecologists in the long run. The authors spend some time talking about metadata. In a field like ecology, metadata is critical, especially in cases when studies are re-used at a later point in time in conjunction with newer studies. The authors seem to talk about metadata driven data collections as being separate from vertical databases (data warehouses). That seems to be too simplistic a view. Combining the two paradigms is probably a more powerful approach, one that has been discussed here in the past. There is a role for core databases in the mode of Genbank, which can be combined with data on the edges. While the data on the edges might lack the structure of a comprehensive databases, but by building semantic intelligence and developing appropriate standards/ontologies, one can combine the knowledge in metadata driven datasets with the core knowledge housed in structured data warehouses. Ecological projects are very diverse, crossing species, societies, data types, data volume and data quality. All of this makes the nature of metadata rather complex and will require ecologists to spend a considerable amount of time developing data standards and better still, ontologies to come up with ways to enable the interoperability of the datasets so that high quality data analysis becomes possible.
The good news for the field is that there are a number of existing resources, e.g. The Knowledge Network for Biocomplexity which seems to be a fairly modern resource for ecological data. There are attempts to provided a unified interface to many ecological data sources, but one could argue that their time is better spent enabling the creation of search engines and interfaces to any resources of ecological information since information will be generated by a variety of sources. The dynamic nature of ecological data will be a significant challenge for data integration, especially since a lot of continuous modeling and re-modeling. There needs to be a way to store and version different studies, to make sure people are not making incorrect decisions.
For someone with only a peripheral knowledge of ecology, but a good understanding of bioinformatics, the review by Jones et al is a very useful and interesting read. Ecologists are trying to understand several critical problems facing our society and planet. How they access data, interpret it, and publish their results should be a problem with more eyeballs on it. Given the very public interest in sustainable development and the environment these days, hopefully there will be more informatics-savvy people working in the field to develop high quality databases, data standards and ontologies.
Reference: M.B. Jones, Schildhauer, M.P., ÂReichman,O.J., and ÂBowers, S. The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere, Annual Review of Ecology, Evolution, and Systematics, 37: 519-544 (2006)
Picture: Via bprm2
Technorati Tags: Just Science Week, Bioinformatics, Ecoinformatics, Ecology, Data, Search



4 Trackbacks
[...] Deepak Singh presents Ecoinformatics – Information for our planet posted at bbgm. The post discusses a reivew paper and presents ways to merge science and sustainability [...]
[...] My Linkblog « Ecoinformatics – Information for our planet [...]
[...] Plants have actually always been at the forefront of genetics/omics, (just ask Mendel or the anti-GMO bunch). However, I feel that today, they get buried in our efforts to find a reason for our own obesity (to pick an example). Ecoinformatics, extending the SNP-discovery consumer model to the agro busines, etc can make a huge impact, not just to our food supply, but also to our energy needs some day. [...]
[...] business/bytes/genes/molecules has a post – Ecoinformatics – Information for our planet – that cites: The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere Bioinformatics, the application of computational tools to the management and analysis of biological data, has stimulated rapid research advances in genomics through the development of data archives such as GenBank, and similar progress is just beginning within ecology. One reason for the belated adoption of informatics approaches in ecology is the breadth of ecologically pertinent data (from genes to the biosphere) and its highly heterogeneous nature…. [...]