Making sense of 260 gigbytes of data

Information on leaked disc drive equivalent to half a million books

The International Consortium of Investigative Journalists' exploration of offshore secrets began when a computer hard drive packed with corporate data arrived in the post. Gerard Ryle, ICIJ's director, obtained the small black box as a result of his three-year investigation of Australia's Firepower scandal, a case involving offshore havens and corporate fraud.

The hard drive contained more than 260 gigabytes, the equivalent of half a million books. Its files included two million emails, four large databases. There were details of more than 122,000 offshore companies or trusts, and nearly 12,000 intermediaries.

Unlike the smaller cache of US cables and war logs passed in 2010 to WikiLeaks, the offshore data was not structured or clean, but an unsorted collation of memos and instructions, official documents, emails, large and small databases and spreadsheets, scanned passports and accounting ledgers.


Text retrieval
Analysing the immense quantity of information required "free text retrieval" software, which can work with huge volumes of unsorted data. Such high-end systems have been sold for more than a decade to intelligence agencies, law firms and commercial corporations. Journalism is just catching up. The named people who administered offshore companies included shareholders, directors, secretaries, lawyers, accountants, nominees and trustees. But many such structures were simply legal devices designed to conceal.

READ MORE

The real beneficial owners proved often to be the so-called “settlors” or “protectors” of offshore trusts, and those holding powers of attorney to exert secret control over the accounts.

China, Hong Kong, Taiwan, the Russian Federation and former Soviet republics appeared to provide the majority of secret offshore owners. The British Virgin Islands are the second-largest source of capital investment in China – on paper at least. Cyprus, an offshore island currently in financial crisis as a result, is also identified in the data as a huge source of Russian investment.

ICIJ’s collaborating journalists from 46 countries constituted one of the largest groups ever to have worked together on a data project. Marina Walker Guevara, Michael Hudson, Nicky Hager and Stefan Candea worked from the US, New Zealand and Romania.

Others who contributed included Mar Cabra, Kimberley Porteous, Frederic Zalac, Alex Shprintsen, Prangtip Daorueng, Roel Landingin, Francois Pilet, Emilia Diaz-Struck, Roman Shleynov, Harry Karanikas, Sebastian Mondial and Emily Menkes.

Interestingly, the team’s attempts to use encrypted email systems such as PGP (“Pretty Good Privacy”) were abandoned because of complexity and unreliability that slowed them down.


Data mining
Meanwhile, programmers in Germany, the UK and Costa Rica also designed sophisticated data mining and cleaning software for ICIJ. Manual analysis in New Zealand proved crucial indecisions on what countries ICIJ needed reporters.

ICIJ's own search system – named Interdata – was developed by a British programmer as dozens of new journalists joined the expanding project. Interdata allowed them to download copies of those of the 2.5 million offshore documents relevant to their countries.

ICIJ rebuilt some of the databases in an effort to run them in their original format. There were surprises. The databases were formatted to record who really lay behind each entity, as required by international regulations on money laundering and “due diligence”. Journalists hoped the truth was just a click away.

Entries for "beneficial owners" were often empty. The offshore agencies had frequently passed off their supposed legal responsibility to intermediaries in other countries. The empty fields were not an accident; it was the design.
Guardian service