Battle begins for control of hidden internet information

Wired on Friday: It's the dark matter of the digital world: it's invisible, ever-present and immeasurable, but it holds the …

Wired on Friday: It's the dark matter of the digital world: it's invisible, ever-present and immeasurable, but it holds the visible world of data together. It's metadata - data about data - and it lies at the heart of the next commercial and legal battles on the internet, writes Danny O'Brien.

When your computer processes a chunk of bits and bytes, metadata is the part of those numbers that describes what the contents are supposed to be. To you, the contents of an e-mail is the message, but it is also the date, subject and who that message is for and from.

The picture concealed in an image file is the data; the part of an image file that tells the computer it is an image file is the metadata. Your digital camera stamps each of your photographs with an invisible bounty of metadata: the time the picture was taken, the exposure, even the serial number of the camera.

Some metadata is easy to find - any program can peer at your snaps and deduce the date it was taken. Others are far harder to find. And the harder it is to find, the more lucrative the business in gathering or exploiting it.

READ MORE

Think of the metadata that drives Amazon. In the beginning, the company had to buy in much of its vital metadata - the catalogue of ISBN numbers, scans of the prices, publication dates. Now the company's crown jewels lie in the exclusive treasure trove it has built.

Some was collected through expensive donkey work: scans of book covers (and later contents). Others were produced as a side-effect of Amazon's book sales: the ratings customers give their purchases are some of the most precious metadata Amazon owns.

Google is another company whose core market is metadata. When Google started, its unique selling point was just one number per website: the "page-rank", a number between zero and 10 that reflected how many other sites linked to it, and therefore how respected that site was by the rest of the internet. From the database of that metadata came the company's ability to outsmart its competition and cement its lead as the top search engine.

But what is peculiar about metadata, at least in the US, is that it has such weak protection in law against copying and replication. In Europe, companies have had a "database right" over collections of data since 1997 (it was implemented in the Republic in 2000). This right prevents others from copying or sampling that data for 15 years.

No such right exists in the US. In theory, such a lack of legal protection seems to give Europeans a headstart in the metadata markets. They could, for instance, simply "steal" US companies' metadata knowledge and use it as the basis of their own works, just as the US benefited from its first 100 years as a pirate nation without protection for foreign copyrights.

And yet, companies like Google and Amazon have managed to profit from their metadata far in excess of any European company. The reason seems to be clear. Metadata collections profit from their size and completeness, and are also easier to defend from copiers if they are so big and so swiftly changing that they cannot be reproduced.

These companies have made money from their metadata by hiding it in plain view. Every time you conduct a Google search, you catch a glimpse of what they know about the world. But to collect all of that data, even from Google alone, would require practically another Google.

So metadata, protected by law or not, can make you rich. But there's another side to metadata, one which can seriously challenge existing business models in the digital world, as well as create new ones.

Take a look at www.yes.com. On its front page you can see a map of the US, with song titles flashing up from state to state. The track names are a sample what is playing at this moment on more than 2,500 US radio stations. It's an incredible repository of fascinating facts, but more importantly, it's a new revenue opportunity.

Radio listeners visit the site if they want to find out the name of a song they just missed. Yes.com provides a direct link to iTunes so they can buy it within seconds.

You'd imagine the music industry would welcome this use of metadata with open arms. On the contrary: the Recording Industry Association of America (RIAA) is terrified it will fall into the wrong hands.

Connect a list of what's playing on the radio together with a PC that has an FM radio receiver, and you have a way to record and keep songs forever: the PC records the lot, then names these music files using the website's information about what was playing at the moment of recording.

Of course, the idea of people seriously doing this is somewhat hard to credit: editing out DJs' endless prattle is probably worth the few dollars it takes to buy such tracks online. But the RIAA is still seriously considering placing restrictions on the use of metadata in new digital radio sets, and is presumably at this moment puzzling over what to do with loose metadata in the FM world.

The battle over metadata is only just beginning. In this "Web 2.0" world, the best plan for collecting valuable metadata is by encouraging users to contribute it. The website Digg.com chooses its front-page stories by getting its readers to vote on the best article. Some users are beginning to realise that Digg gains its value from their contributions, even though they receive no monetary payment for the work of voting.

Should metadata be "owned" in the same way copyrighted works are commonly thought of as "owned"? And if so, could Amazon or Google have to beg every customer or website for the right to collect metadata in the future? Watch this space. Or rather, watch the information being collected about this space.

Danny O'Brien is activism co-ordinator for the Electronic Frontier Foundation