The Mostly Color Channel: The power of crowd sourcing

Monday, June 10, 2013

The power of crowd sourcing

This morning on the local radio in the transmission Morning Edition there was a short piece on the new NSA data farm in Utah, which is supposed to go on-line this September. The piece stated that the data farm will store 5 zettabytes, and the old data farm in Virginia, which will remain on-line, has about 2/3 of the capacity.

These 8 zettabytes are contributed by us aliens, i.e. non citizens: this makes it crowd sourced data. How does this compare to the data that the best and brightest scientists in the world can create? At CERN, the CERN Data Centre has recorded over 100 petabytes of physics data over the last 20 years; collisions in the Large Hadron Collider (LHC) generated about 75 petabytes of this data in the past three years; the bulk of the data (about 88 petabytes) is archived on tape using the CERN Advanced Storage system (CASTOR) and the rest (13 petabytes) is stored on the EOS disk pool system — a system optimized for fast analysis access by many concurrent users. For the EOS system, the data are stored on over 17,000 disks attached to 800 disk servers; these disk-based systems are replicated automatically after hard-disk failures and a scalable namespace enables fast concurrent access to millions of individual files.

A zettabyte is 2⁷⁰ bytes and a petabyte is a paltry 2⁵⁰ bytes, indicating that crowd sourcing can yield 5 orders of magnitude more data than the best scientists can. And while the scientists use the most powerful particle smasher ever built by human kind, the crowd just uses their fingers on plain old keyboards.

The more mind-boggling data point is that at some point the NSA may want to synchronize the data in the two farms. To get an idea of the required bandwidth, consider that backing up a 1 terabyte (2⁴⁰ bytes) solid state disk to a top-of-the-line external disk over a FireWire 800 connection takes 5:39:39 hours…

CERN data centre

Servers at the CERN Data Centre collected 75 petabytes of LHC data in the last three years, bringing the total recorded physics data to over 100 petabytes (Image: CERN)

No comments:

Post a Comment

About this blog

The Internet is an amalgam of forms blurred under epistemological pressures. In Søren Kierkegaard’s words, under this flat shower of leveled information, where everybody is interested in everything and nothing is too trivial or too important, people just accumulate information and postpone decisions indefinitely, i.e., nobody takes action and nobody is responsible for truth — there is no mastery, just gossip. He called this the æsthetic sphere of existence, exhorting us to evolve to the ethical sphere, where we do not just accumulate information but take action and make commitments. Blogs are instruments to overcome flatness by creating opportunities for vertical activities. In this sense this blog is a view from my window — a collection of tidbits I judged relevant to computational color science and in general to the promotion of scientific excellence in areas of strategic importance for the future of research, economy and society.

The Mostly Color Channel

Monday, June 10, 2013

The power of crowd sourcing

No comments:

Post a Comment

Search This Blog

Featured Post

Meta-Palette

Understanding Color

Cognitive Aspects of Color

The Color Thesaurus...

Popular Posts

Blog Archive

Labels

Contributors

Blogroll

About this blog

Privacy Policy