Fork me on GitHub

Trendspotting: Molecular profiling data resources

This image was generated in an academic instit...Image via WikipediaLet us say you are a researcher and are doing a gene expression study on some tissue. Today, the chances are that you will run some microarrays and look at the expression profile and then try and correlate the expression profiles of a number of samples with associated data.

Fast forward a few years. I am convinced that a lot of such data will be available via search engines or data portals. Already you are beginning to see a number of commercial and public engines come to life (NextBio, Oncomine, etc). Earlier this week I read an announcement (sub reqd) by the NCI to create a Cancer Molecular Analysis Portal, which will integrate data sets from the Cancer Genome Atlas project and other cancer genomics studies.

The key here is that we already have a body of work using microarrays and other molecular profiling systems, and in many cases, people are just repeating experiments which someone, somewhere has already carried out. Unless there is something inherently proprietary in those studies (e.g specific dose-response studies), there is no reason to repeat that experiment, especially for technologies that are relatively stable and don’t have too much cross-platform/cross-lab variation (one of the goals of the MAQC projects has been to understand these variations). The second key, and to an extent perhaps even more important, is how these data are made available. Personally, I really like the NextBio interface. Will the business model work? I am not sure, but definitely the idea and concept make a lot of sense.

It’s a sign of maturity in many ways, accelerated by the way the web has advanced in the past few years. If we trust data not generated internally, enough to make key decisions, then a scenario where data and analysis results are served up via web services, allowing users to mash up different sources, including internal sources, and develop a relevant scientific intelligence is a distinct possibility. Personally, I would like to think that the value and the users expertise comes from how they integrate all these resources in a manner that makes it a unique asset to the user, i.e. the value of the results come from the way the data are brought together and not any individual data sources

Zemanta Pixie

This entry was posted in Informatics, Omics. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

5 Comments

  1. Sanjiv
    Posted July 21, 2008 at 13:46 | Permalink

    I fully agree with you. However, such databases, listed below, are already available for the scientific community –

    GEO – http://www.ncbi.nlm.nih.gov/geo/index.cgi?qb=pro

    ArrayExpress – http://www.ebi.ac.uk/microarray-as/ae/

    Phenogen Informatics – http://phenogen.uchsc.edu/

    Stanford Microarray Database – http://genome-www5.stanford.edu/

    Users can select any number of experiments/samples to carry out the meta-analysis using the available tools.

  2. Posted July 21, 2008 at 16:12 | Permalink

    Indeed and growing, but most companies still run their own arrays, and integrate public information as required. The future will be the opposite. You run your own arrays as required, but primarily use public (or private) resources. Additionally these won't just be databases, but web services, where the combination or resources and how they are mashed up will be the key value.

  3. Posted July 21, 2008 at 17:35 | Permalink

    While I generally agree with your thoughts, there are a number of issues with simply mashing up different gene expression data sources: platform, sample quality, normalization, reference and annotation are just a few that I can come up with off the top of my head.

    I've been involved with a microarray consortium project that is one of a small number of studies (that I'm aware of) that included a technical reference sample in each array processing batch to align the data. Alignment is a HUGE problem with expression data. We further performed extensive statistical analysis to identify potential outlier arrays prior to data analysis. Most gene expression profiling studies don't take these quality control steps. I think pooling data sets that aren't preprocessed in a similar manner is risky.

    In my experience, it's easy to show statistical significance with microarray data, especially with large data sets. However, statistical significance doesn't mean it's biologically meaningful.

  4. Posted July 21, 2008 at 18:16 | Permalink

    Walter, very real concerns, and the kinds my previous company spent a lot of its time thinking about.

    Like most data types, microarray results will become more commoditized, i.e. the reliability and alignment issues will get resolved. The reference, annotation, etc information will become metadata that accompanies your data sources. It's a question of making sure all the experimental details are included with the results being provided. They key is that there will be a body of work available that should make it less necessary to do your own microarray experiments except when absolutely necessary.

  5. Jorge Avilas
    Posted February 1, 2009 at 12:34 | Permalink

    nice article! nice site. you're in my rss feed now ;-)
    keep it up

One Trackback

  1. By Accessible databases on August 15, 2008 at 16:43

    [...] Trendspotting: Molecular profiling data resources: [Via business|bytes|genes|molecules] Image via WikipediaLet us say you are a researcher and are doing a gene expression study on some tissue. Today, the chances are that you will run some microarrays and look at the expression profile and then try and correlate the expression profiles of a number of samples with associated data. [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present