British company CACI aims to have first set of definitive results ready by April 2007, a year after the count took place, writes Robin O'Brien Lynch
The 2006 census is the most important this country has had, we have repeatedly been told. Social upheaval, a huge rise in the non-national population and the fracturing of the nuclear family mean that it is absolutely vital that we get exact figures from every single household.
So who has this mammoth task been entrusted to and how do they count over four million people?
London-based information solutions company CACI and its partner, Top Images Systems (Tis), form the same alliance that brought you the last census in 2001. Five years after they won the original contract, the partners were awarded a contract extension worth €3.5 million last August.
For their clients, the Central Statistics Office (CSO), nationality matters little, while results mean everything.
"We're not in a position to have a preference for an Irish company," says Gerry Walker, senior statistician with the CSO and head of operations for the 2006 census.
"We look at the bids which are most economically viable and then take other factors into account. One factor which was high up on our list at the time was previous experience working with censuses, and Tis, who made a joint bid with CACI, had significant experience in the area. The original process was for the 2001 census and there was an option for an eight-year contract to cover both 2001 and 2006."
Matthew Cooper is project manager and an executive consultant with CACI. He feels that the nationality of the team has little bearing on the project.
"We realise that this is a very important census for Ireland, given the changes of the past few years, but that doesn't really impact on our work," he says. "We are building and designing a system that conforms to CSO requirements and we will deliver the best quality solution possible.
"I wouldn't say we are at disadvantage coming in as a British company. We work for local government and within the public sector in the UK, mostly on benefit solutions for claim forms. This is our only project in Ireland; we see it as natural extension of our work in the UK where we are utilising the same core technologies.
"In fact our business partner, Tis, which provides the eFlow intelligent character recognition (ICR) software that we integrate into our census processing solution, has its headquarters in Israel."
CACI's remit is to build a system that will scan every page of every form, intelligently recognise and validate the data, and present data that has failed validation rules for operator repair.
The CSO is also able to search and retrieve scanned images via a custom built census document management system (CDMS).
"We started this project in early 2005. It's a complete turnkey solution in which we developed the business design, technical design and application design. We set up the server infrastructure, the storage area network - and we even primed a contract to print and quality assure the census forms with a local Irish printing firm, DC Kavanagh," says Cooper.
"Our role is to develop an end to end, census processing system. We scan the census form and eFlow intelligently recognises and validates the information on the form, be it optical mark recognition (OMR), alpha characters, numeric characters or barcode. We're dealing with a very high number of forms here and we need to provide the best solution possible. We also deliver complete process tracking, enabling the CSO to track the life cycle of forms throughout the whole process."
When the CSO designed the questions on the forms, simplicity and legibility were its priorities, so the public would have no problem misunderstanding the questions, and the data collected would be free from ambiguity. For CACI, the layout needed the same qualities so the technology could run as quickly and accurately as possible.
It works by scanning the marks made on the forms and using a defined set of business rules to convert that into the data for the CSO. The same rules are used when an anomaly arises, such as a double tick, a smudge or a fold.
"The eFlow ICR technology recognises the type of census form/page to be processed and uses character-recognition technology using logical rules, keywords, and sophisticated 'fuzzy matching' algorithms," says Cooper.
"Our solution identifies and interprets the captured information while validating it against the CSO business rules and data dictionaries. It can interpret handwriting, machine print, bar codes and mark fields. It also offers statistics in real time on documents being processed.
"It allows the implementation of a variety of rules. Certain fields, as well as unrecognised data, are manually verified and completed, with or without the help of data dictionaries and CSO business rules to obtain the highest level of accuracy."
CACI is training operators and will go live on July 3rd, when all forms have been collected. The end of December is the deadline for processing all 36 million pages, two months quicker than the last census, despite an increase of 200,000 forms.
Preliminary results, based on the enumerators' summaries, should determine the amount of males and females in each electoral area by July 19th, and after quality checks and tabulation, the first set of definitive results is due on April 23rd, 2007, a year after the original count.
And after all the high-end solutions and complex software systems, the forms are dealt with the old-fashioned way: bound up in dusty boxes in Government buildings for a century or so.
"Once the forms are scanned, they are collected and sent to the National Archive, and then released to the public after 100 years," says Walker.
"There is huge interest in the forms, mostly for genealogical reasons. No forms have yet been released under this Census Act, but the ones from 1911 are available. There is more demand for census forms than anything else at the National Archive."