GIVE US YOUR DATA!!!
Yes that was the title of a session led by Google-ites, including Chris DiBona (I forget the name of the actual session leader). The following are notes from that session, somewhat cleaned up.
Google has a mission, one of organizing the worlds information and making it universally accessible and useful. In keeping with that mission, it should come as little surprise that they have a tremendous interest in the sciences. Their current project has the following goals
1. Archive interesting scientific data
2. Distribute data to the people who need it
At this point, they would essentially like to solve engineering infrastructure problems, in a world where the cost of storage is going down by 78% every year. (”Moore’s law is for wimps” was the title of that slide I believe)
At the current time the following are not what Google is trying to achieve
1. Access controls
2. Supporting non-open data (They will support public domain data/CC data)
3. Building domain specific tools
4. In Situ computation
5. Profit
How are they getting there? Well they are providing a 3TB drive array (Linux RAID5). The array is provided in “suitcase” and shipped to anyone who wants to send they data to Google. Anyone interested gives Google the file tree, and they SLURP the data off the drive. I believe they can extend this to a larger array (my memory says 20TB)
Challenges
1. Collecting vs. crawling
2. Culture of proprietorship/exclusivity
3. Licenses
4. International shipping is hard
5. What do you do with all the Metadata?
6. What does it mean to index scientific data?
Chris Di Bona wants to make all data open source. Go Chris!!! Remember, the data is not knowledge, although for too many scientisits, the data are their research.
The data will probably be provided on a Google Code like page, and anyone should be able to get access to the data. There was talk of allowing people to build applications of the data. As Peter Murray-Rust noted, putting the data in the cloud is definitely enticing to some (I would add Amazon to his list as well). Like many others, I am curious to see where this goes. Quite a few people, and not just the astrophysics variety were very interested in what Google has to offer.
Technorati Tags: Scifoo, Google, Science, Data, Storage, Open Science, Open Data
Scifoo: Google and large scientific datasets
GIVE US YOUR DATA!!!
Yes that was the title of a session led by Google-ites, including Chris DiBona (I forget the name of the actual session leader). The following are notes from that session, somewhat cleaned up.
Google has a mission, one of organizing the worlds information and making it universally accessible and useful. In keeping with that mission, it should come as little surprise that they have a tremendous interest in the sciences. Their current project has the following goals
1. Archive interesting scientific data
2. Distribute data to the people who need it
At this point, they would essentially like to solve engineering infrastructure problems, in a world where the cost of storage is going down by 78% every year. (”Moore’s law is for wimps” was the title of that slide I believe)
At the current time the following are not what Google is trying to achieve
1. Access controls
2. Supporting non-open data (They will support public domain data/CC data)
3. Building domain specific tools
4. In Situ computation
5. Profit
How are they getting there? Well they are providing a 3TB drive array (Linux RAID5). The array is provided in “suitcase” and shipped to anyone who wants to send they data to Google. Anyone interested gives Google the file tree, and they SLURP the data off the drive. I believe they can extend this to a larger array (my memory says 20TB)
Challenges
1. Collecting vs. crawling
2. Culture of proprietorship/exclusivity
3. Licenses
4. International shipping is hard
5. What do you do with all the Metadata?
6. What does it mean to index scientific data?
Chris Di Bona wants to make all data open source. Go Chris!!! Remember, the data is not knowledge, although for too many scientisits, the data are their research.
The data will probably be provided on a Google Code like page, and anyone should be able to get access to the data. There was talk of allowing people to build applications of the data. As Peter Murray-Rust noted, putting the data in the cloud is definitely enticing to some (I would add Amazon to his list as well). Like many others, I am curious to see where this goes. Quite a few people, and not just the astrophysics variety were very interested in what Google has to offer.
Technorati Tags: Scifoo, Google, Science, Data, Storage, Open Science, Open Data