In a great blog post at Code for Life, Grant Jacobs writes
By contrast, early bioinformatics work was almost invariably founded on biological concepts from the onset. A biological issue was raised and then a technique to address that issue was presented. That is, theoretical biology was the foundation on which [early] bioinformatics was built. I fear this is being lost in the mass-data and technology-hype driven bioinformatics. It seems to me that unless companies and research groups are careful many will waste time and money “stamp collecting and cataloging”. Certainly the organized data is useful, but only if it is applied with biological principles
Grant writes this in the context of the early days of bioinformatics, a time when there was a lot of theoretical biology implicit in the various methods being used and that today biologists lack some of the theoretical knowledge that their predecessors had (many of the early bioinformatician came, as noted, from the hard sciences). He also goes on to comment that the early bioinformatics could be called (and often is today) Computational Biology. Like him I do prefer that label. Like Neil on the Friendfeed thread I don’t believe that organizing data is a waste of time and money, but it isn’t biology either.
This takes me back to the argument I make so often these days. Today’s data-driven biology has the following two aspects
Data production. This is the part that generates a ton of data, often in what might be considered a “factory” environment, although the trend might be moving towards more systematic data production
Data consumption. This is the part that I consider biology, the part where bench scientists and computational biologists get to work, trying to get meaning from the data. Here is where you develop new theoretical models which might be able to explain what the data means, start building network models which might explain certain diseases, start developing and running assays which might explain those models. In other world, the beauty of science.
But there is the part that has become more difficult as we generate more data, the part that requires an understanding of biology and technology. The gray area in between which is the domain of bioinformaticians and software developers (personally I think the software people run the gamut from beginning to end). Here is where you learn how to organize your data, associate it with metadata, develop tools to slice and dice data sets that can then be studied in more detail, to try and develop early insights based on past knowledge. Data Management, Data Warehousing, Data Pipelines. They might sound cold. They might even sound like science, but I’d argue that they are a core part of science, because as our data sets get large, more diverse and our need to understand the complexity of biology reaches new heights without good data platforms and the ability to query those platforms in biologically meaningful ways we won’t get to where we need to. Yes, I agree that you need a sound understanding of the underpinnings of biology, or at least an idea of the kinds of questions we need to ask, but you also need the technology to make the data useful for science, otherwise it’s just that, data.
Bioinformatics and mythology. You still need to manage the data
In a great blog post at Code for Life, Grant Jacobs writes
Grant writes this in the context of the early days of bioinformatics, a time when there was a lot of theoretical biology implicit in the various methods being used and that today biologists lack some of the theoretical knowledge that their predecessors had (many of the early bioinformatician came, as noted, from the hard sciences). He also goes on to comment that the early bioinformatics could be called (and often is today) Computational Biology. Like him I do prefer that label. Like Neil on the Friendfeed thread I don’t believe that organizing data is a waste of time and money, but it isn’t biology either.
This takes me back to the argument I make so often these days. Today’s data-driven biology has the following two aspects
Data production. This is the part that generates a ton of data, often in what might be considered a “factory” environment, although the trend might be moving towards more systematic data production
Data consumption. This is the part that I consider biology, the part where bench scientists and computational biologists get to work, trying to get meaning from the data. Here is where you develop new theoretical models which might be able to explain what the data means, start building network models which might explain certain diseases, start developing and running assays which might explain those models. In other world, the beauty of science.
But there is the part that has become more difficult as we generate more data, the part that requires an understanding of biology and technology. The gray area in between which is the domain of bioinformaticians and software developers (personally I think the software people run the gamut from beginning to end). Here is where you learn how to organize your data, associate it with metadata, develop tools to slice and dice data sets that can then be studied in more detail, to try and develop early insights based on past knowledge. Data Management, Data Warehousing, Data Pipelines. They might sound cold. They might even sound like science, but I’d argue that they are a core part of science, because as our data sets get large, more diverse and our need to understand the complexity of biology reaches new heights without good data platforms and the ability to query those platforms in biologically meaningful ways we won’t get to where we need to. Yes, I agree that you need a sound understanding of the underpinnings of biology, or at least an idea of the kinds of questions we need to ask, but you also need the technology to make the data useful for science, otherwise it’s just that, data.
Related articles by Zemanta