The Open Data licensing issue
May 11, 2008
A little tied up this weekend, so will keep it brief. I have added a number of comments on Friendfeed to posts I have shared from Google Reader about what the licensing of data should be.
The whole thing started by Antony Williams announcing CC support for data on ChemSpider. That was followed by a chain of events and a ton of confusion. Let me add my voice to this debate, since Open Data is near and dear to my heart
I classify scientific data into the following categories
- Raw data: This is the kind of data deposited in Tranche, or RCSB, or GenBank. Sequence data, structural data, raw proteomics data. There are associated metadata that are required for quality and reproducibility.
- Processed data: These are the results of doing something with the raw data, e.g. molecular simulation results from a PDB structure and form a continuum
I can’t but agree with John Wilbanks. Here is the part that all of us should read again and again
The public domain is not an “unlicensed commons”. The public domain does not equal the BSD. It is not a licensing option.
It is the natural legal state of data.
It is a damn shame that we no longer think of the public domain as an option that is attractive. It’s a sign of the victory of the content holders that the free licensing movements work against that something without a license – something that is truly free, not just just free “as in” – is somehow thought to be worse. We’ve bought into their games if we allow the public domain to be defined as the BSD. The idea of the public domain has been subjected to continuous erosion thanks to both the big content companies and our own movements, to the point where we think freedom only comes in a contract.
The public domain is not contractually constructed. It just is. It cannot be made more free, only less free. And if we start a culture of licensing and enclosing the public domain (stuff that is actually already free, like the human genome) in the name of “freedom” we’re playing a dangerous game.
The public domain is the natural place for raw scientific data. That’s where it belongs and always has been. We, myself included, have been guilty of making things more complicated than they need to be. There is a data commons already. Our goal should be to make sure people respect it, and make data available in ways that we can take advantage of it.
Our discussion on content licensing should be limited to processed data, i.e. what we do with data in the public domain. There, we need to allow people to make choices, but keep the raw data unfettered. Those who want to associate copy left licenses with raw data are being dogmatic. Scientific data doesn’t have to be viral or anything like that, it’s there for the greater scientific good, and there’s only one logical mechanism for it. In fact, I would argue that putting copy left on it (a sequenced genome doesn’t belong to anyone) is as wrong as full on copy protection. You may have some embargo on making it publicly available, especially with things like structures where you might want to do something with it before anyone, but in the end the data belong in the public domain
I would like to thank John for putting this down so emphatically and clearly. A lot of us have been saying the same thing for a while, but this is the most clear distillation that I’ve read yet.
That does not mean we don’t have to have a discussion around how we make content (not raw data, but follow on content) available and the implications. Antony was confused for good reason.
Further reading
More from John
Cameron Neylon
Egon Willighaghen
More from Egon
Bill Hooker
Web as platform: Bret Taylor on Open Data
Open Science and licensing
Protocol for implementing open access data
bbgm post on protocol for open data
Does anyone have a clue who this could be?
May 7, 2008
You don’t get to see job descriptions like this too often in the life sciences. Have to love the What you get section.
What does the job description tell us. It’s a web-based consumer focused company with a focus on healthcare and with an informatics backend. Comes out of Stanford and has a Nobel prize winner advising it, which sounds very much like Andy Fire (based on the Stanford angle).
Let’s start the speculation.
Guess where I found this position; by tracking ‘bioinformatics’ on Twitter
Technorati Tags: Andy Fire, Healthcare, Stealth Startup, Stanford, Xooglers
Discussion on business models around Open Data is building up
May 6, 2008
This post got deleted during a blog snafu. Reposting
Many months ago, I started talking about the monetization of biological data, a theme that’s been present throughout the history of bbgm. In general, I have maintained that for the most part, the value lies not in the raw data, but in what we can do with the data. It looks like there is an interesting discussion brewing on the web around some of these ideas. Here are three a couple of posts, I think in chronological order
Peter Murray-Rust. The comment from Rich Apodaca is a must read. There is a follow up post from Antony Williams as well.
I will just re-iterate a generalizations, because I am only peripherally familiar with the specifics. On the web, data should be available as an addressable resource. The fact that data is available as RDF is great (and I wish more data was available as such). However, my personal preference is that data, especially open data, needs to be accompanied by APIs that allow the data to be accessed in a number of formats (not a dump per se). I think over time the acceptable formats will be established. The key aspect here are the business models. Is the business in providing a service on top of the data? For example for more than X number of API calls, there could be a fee associated.
These business models are going to be the key. Just like Open Source has found business models as have some web services, the models that allow people to build upon Open Data are the key
Image via Wikipedia
Technorati Tags: Open Data, Web Services, CrystalEyes, ChemSpider
HPC and structure-based drug design
May 5, 2008
Here is the abstract of a paper in Hypertension entitled Structure-based identification of small-molecule angiotensin-converting enzyme 2 activators as novel antihypertensive agents.
Angiotensin-converting enzyme 2 (ACE2) is a key renin-angiotensin system enzyme involved in balancing the adverse effects of angiotensin II on the cardiovascular system, and its overexpression by gene transfer is beneficial in cardiovascular disease. Therefore, our objectives were 2-fold: to identify compounds that enhance ACE2 activity using a novel conformation-based rational drug discovery strategy and to evaluate whether such compounds reverse hypertension-induced pathophysiologies. We used a unique virtual screening approach. In vitro assays revealed 2 compounds (a xanthenone and resorcinolnaphthalein) that enhanced ACE2 activity in a dose-dependent manner. Acute in vivo administration of the xanthenone resulted in a dose-dependent transient and robust decrease in blood pressure (at 10 mg/kg, spontaneously hypertensive rats decreased 71+/-9 mm Hg and Wistar-Kyoto rats decreased 21+/-8 mm Hg; P<0.05). Chronic infusion of the xanthenone (120 microg/day) resulted in a modest decrease in the spontaneously hypertensive rat blood pressure (17 mm Hg; 2-way ANOVA; P<0.05), whereas it had no effect in Wistar-Kyoto rats. Strikingly, the decrease in blood pressure was also associated with improvements in cardiac function and reversal of myocardial, perivascular, and renal fibrosis in the spontaneously hypertensive rats. We conclude that structure-based screening can help identify compounds that activate ACE2, decrease blood pressure, and reverse tissue remodeling. Administration of ACE2 activators may be a valid strategy for antihypertensive therapy.
Here’s the HPCwire story, which really doesn’t tell me much other than really high throughput docking, but they use words like
That in itself is a significant accomplishment because no one has ever specifically identified a compound that enhances the activity of an enzyme using a rational structure-based approach
Anyone have a subscription to Hypertension? I am really curious cause nothing I read screams “unique” to me. Of course, I can just wait till tomorrow and try and get to the paper from work.
Update: Got the paper, and still don’t get the fuss. It’s an elegant virtual screening strategy, but I wouldn’t say it’s revolutionary. I was hoping to see something more advanced, e.g. protein flexibility, better energy functions, etc.
Image via Wikipedia
Technorati Tags: Virtual Screening, Structure-based Drug Design, Hypertension, High Performance Computing
Sun and Amazon jump into the pool together
May 5, 2008
At JavaOne, one of the big announcements was a hookup between Amazon, specifically EC2, and OpenSolaris (finally generally released as a full open source OS). The collaboration between Amazon and OpenSolaris will give customers access to OpenSolaris (for feree) and MySQL premium technical support, and more. The key selling points are ZFS and D-Trace. Now, I am a big Linux guy, but options are always good and enterprise relationships/partnerships are just a sign of the maturing and relevance of cloud computing.
Aside. It’s interesting that Sun talks about OpenSolaris as the OpenSolaris community
Technorati Tags: Cloud Computing, Amazon Web Services, Sun Microsystems, OpenSolaris, ZFS






