The Open Data licensing issue

May 11, 2008

A little tied up this weekend, so will keep it brief. I have added a number of comments on Friendfeed to posts I have shared from Google Reader about what the licensing of data should be.

The whole thing started by Antony Williams announcing CC support for data on ChemSpider. That was followed by a chain of events and a ton of confusion. Let me add my voice to this debate, since Open Data is near and dear to my heart

I classify scientific data into the following categories

  • Raw data: This is the kind of data deposited in Tranche, or RCSB, or GenBank. Sequence data, structural data, raw proteomics data. There are associated metadata that are required for quality and reproducibility.
  • Processed data: These are the results of doing something with the raw data, e.g. molecular simulation results from a PDB structure and form a continuum

I can’t but agree with John Wilbanks. Here is the part that all of us should read again and again

The public domain is not an “unlicensed commons”. The public domain does not equal the BSD. It is not a licensing option.

It is the natural legal state of data.

It is a damn shame that we no longer think of the public domain as an option that is attractive. It’s a sign of the victory of the content holders that the free licensing movements work against that something without a license – something that is truly free, not just just free “as in” – is somehow thought to be worse. We’ve bought into their games if we allow the public domain to be defined as the BSD. The idea of the public domain has been subjected to continuous erosion thanks to both the big content companies and our own movements, to the point where we think freedom only comes in a contract.

The public domain is not contractually constructed. It just is. It cannot be made more free, only less free. And if we start a culture of licensing and enclosing the public domain (stuff that is actually already free, like the human genome) in the name of “freedom” we’re playing a dangerous game.

The public domain is the natural place for raw scientific data. That’s where it belongs and always has been. We, myself included, have been guilty of making things more complicated than they need to be. There is a data commons already. Our goal should be to make sure people respect it, and make data available in ways that we can take advantage of it.

Our discussion on content licensing should be limited to processed data, i.e. what we do with data in the public domain. There, we need to allow people to make choices, but keep the raw data unfettered. Those who want to associate copy left licenses with raw data are being dogmatic. Scientific data doesn’t have to be viral or anything like that, it’s there for the greater scientific good, and there’s only one logical mechanism for it. In fact, I would argue that putting copy left on it (a sequenced genome doesn’t belong to anyone) is as wrong as full on copy protection. You may have some embargo on making it publicly available, especially with things like structures where you might want to do something with it before anyone, but in the end the data belong in the public domain

I would like to thank John for putting this down so emphatically and clearly. A lot of us have been saying the same thing for a while, but this is the most clear distillation that I’ve read yet.

That does not mean we don’t have to have a discussion around how we make content (not raw data, but follow on content) available and the implications. Antony was confused for good reason.

Further reading

More from John
Cameron Neylon
Egon Willighaghen
More from Egon
Bill Hooker
Web as platform: Bret Taylor on Open Data
Open Science and licensing
Protocol for implementing open access data
bbgm post on protocol for open data

Exit Larry, Enter Joi

April 1, 2008

As a sometime musician and a creator of online content, Creative Commons means a lot to me. I have heard Larry Lessig speak and while I don’t agree with everything he says, it is done with conviction that has to be admired and when it comes to creativity, etc, I find it difficult to disagree.

Well, today,Larry Lessig stepped down from the leadership of Creative Commons to focus on Change Congress. He will be replaced by Joi Ito, who has long been involved with Creative Commons and is a worthy successor

Personally, my biggest interest in the Creative Commons effort is with Science Commons. If we are to build a true data commons, then Science Commons is likely to play an important role. The organization does not quite have the visibility in the scientific community that it should have and I hope that changes over the next couple of years

Here is the press release

Technorati Tags: , ,

Protocol for implementing open access data

December 16, 2007

On the 5th anniversary of Creative Commons, Science Commons just announced the Protocol for Implementing Open Access Data (”The Protocol”). Four months ago, I had wondered what the folks at Science Commons were planning for open data. I think we just got our answer.

The Protocol is intended to conform to the Open Knowledge Definition and extend the ideas of the Budapest Declaration to data and databases. The Protocol is being submitted to the W3C for consideration. Here are some of the salient features (I quote copiously)

  1. Given the amount of legacy data, it is unlikely that a single license will work for scientific data. Therefore, the memo focuses principles for open access data and a protocol for implementing those principles
  2. Tools created under conforming implementations will create the foundation to legally integrate a database or data product available under a tool conforming to the protocol with another database or data product available under a tool conforming to the protocol
  3. Patent rights are not covered
  4. Any implementation of the Science Commons Database Protocol may be submitted to Science Commons for certification as a conforming implementation.
  5. Implementations found to conform to the Protocol will be authorized to use the Science Commons Open Access Data trademarks (icons and phrases) and metadata on databases available under conforming implementations of the protocol. These marks will be maintained by Creative Commons and released in conjunction with the CC0 project icons and metadata
  6. To facilitate data integration and open access data sharing, any implementation of this protocol must waive all rights necessary for data extraction and re-use (including copyright, sui generis database rights, claims of unfair competition, implied contracts, and other legal rights), and must not apply any obligations on the user of the data or database such as “copyleft” or “share alike”, or even the legal requirement to provide attribution. Any implementation should define a non-legally binding set of citation norms in clear, lay-readable language
  7. To facilitate data integration and open access data sharing, any implementation must include waivers of sui generis and other legal grounds for database protection
  8. To facilitate data integration and open access data sharing, any implementation MUST affirmatively declare that contractual constraints do not apply to the database
  9. To provide for interoperation with non-open access data, any implementation of this protocol MUST NOT enable assertions of copyright, sui generis, or any other forms of contractual control on digital identifiers and metadata describing non-open access data
  10. Science Commons is withdrawing their recommendation that “copyrightable elements” of a database be made available under a copyright license like the CC licenses or the GNU Free Documentation License (FDL), mostly due to the difficulty in determining where copyright begins in and ends in many databases

Much of this work has been done in collaboration with Talis. I haven’t quite managed to read through all the legal speak, but the move away from a copy-left license is very welcome and interesting. In other words, data being submitted to Genbank, etc won’t suddenly be required to have a copy-left license. I wonder how interdependent this protocol and CC0 are?

Anyway, more of this later in the week. I just saw the blog post and had to share it. I consider just the announcement to be a monumental moment. Will it change how scientific information is shared and disseminated? I don’t know. But my hope has always been that Science Commons would lead the way.

In passing, I and waiting to read what Peter Suber, Glyn Moody, Bill Hooker and Peter Murray-Rust, all more involved with these issues than I am have to say about this


Creative Commons License


This work is licensed under a
Creative Commons Attribution 3.0 Unported License.

Technorati Tags: , , , ,

Educating people about data ownership

December 15, 2007

I never got to watch the Bubble 2.0 video (I only heard it on net@nite). Before I could get to see it, it got taken down. Wired talks about the reasons behind the takedown. As a content producer who shares content online and as a scientist who has published papers and a not-so-casual observer of the entire content ownership debate, I am often torn by examples like this one.

What is important for the author? Is it monetary compensation? If content, scientific, media or otherwise is your primary source of income, you can understand why people get a little antsy when someone uses the content without permission. I know too many people, journalists, musicians, etc for whom their creativity is the sole source of income and they are all well meaning, even if they don’t always understand the environment that they operate in.

However, a lot of these issues date back to a world free of Creative Commons, which I believe is celebrating a 5th birthday this weekend. In today’s climate we have choice, so to some extent content owners need to make that choice and then live with their consequences. You can choose to publish your papers in a PLoS journal under a CC license, or you can choose to publish in a closed journal. Obviously, I belong to the open science camp, but I also believe that people have the choice of making decisions. They then must also live with the consequences of those decisions.

What we need is education. When Larry Lessig spoke at the University of Washington recently (I have the full recording if anyone is interested), I asked him a question on this very issue. How many people who upload pictures to flickr really understand the licensing options available to them? How many people understand the pros/cons and implications? Most scientists I know don’t even know what Creative Commons is, Science Commons even less so. On the flip side, do the majority of people wanting to use pictures, etc understand what they can do with media, the proper ways of attribution, etc? I doubt it. Even I am not always sure.

We have a plethora of resources available to us for sharing data, media and information. Scientists have the PLoS and BMC journals. You have resources to share data, documents, pictures, videos, screencasts, etc etc. It is up to us to decide where we put our information and how it is managed. It is also important for everyone to understand and respect those choices. The dialog on what is the best approach to sharing data and the advantages of open data can be discussed as we go along.

UpdateTwo other posts on this topic worth reading

Peter raises an excellent point about scientific images. Scientific images are scientific data, and like all other scientific data need to be open.

One thing I don’t know too much about is fair use. The TechCrunch article is an interesting discussion around the issue of copyright. While I often don’t agree with Mike, I do like that particular post.

Technorati Tags: , ,

Peer-to-patent needs you

November 11, 2007

If I was a photoshop geek, I’d probably mock up the peer-to-patent version of “Uncle Sam needs you”. I’ve talked about Peer to Patent before, and it’s definitely an interesting idea. Without a sufficient volume of users, the project won’t be successful, and the project looks like it’s ready to take on peer reviewers.

Technorati Tags: , ,

Next Page »