Automating drudgery in computing

As software and hardware become more complex, the concept of autonomic computing is used to carry out many basic operations, writes Karlin Lillington

Autonomic computing sounds a bit like a Dr Who plot element, in which the Daleks get computers to take over the world.

Not exactly - but in a more benign way, not too far off the mark either. Autonomic computing means the automation of many of the basic operations of computers, from routine tasks to maintenance and troubleshooting.

The concept comes from IBM research senior vice-president Paul Horn, who spoke in 2001 of getting computer systems to take over certain basic self-regulatory tasks, just as the human autonomic nervous system enables our basic "operations" - breathing, digesting, blinking - to continue in the background without us having to take much notice. That leaves our brains free to do other things.

This month, some 120 computer scientists gathered for a major annual industry conference on the topic, held this year at UCD, the first time it has been outside the US.

"The motivation for autonomic computing was really that the complexity our customers were dealing with was becoming a constraint," explains Alan Ganek, chief technology officer of IBM's Tivoli Software and IBM vice-president of autonomic computing, who chaired a session of the event and led the closing discussion session.

"Technology is now inherent in every new revenue generating area for companies. Increasingly, the ability to deploy technology quickly is a key element of competitiveness," he says.

You could argue that perhaps this is all just a ploy to sell more software - and Ganek acknowledges that complexity does inhibit customers from, as he says, "consuming more technology".

IBM and other companies obviously want to keep selling software and hardware. But the issues that underlie the autonomic computing concept have been gaining force for much of the past decade. Software and hardware have grown more sophisticated and complex and computers have become more powerful, cheaper, and ubiquitous. Gathering together many disciplines within computer science under the aegis of autonomic computing is helping to bring a coherence - and more specifically, some key concepts and standards - to addressing the complexity in a structured way.

This is the huge challenge for an industry which keeps pushing out Moore's Law, stretching the abilities of hardware and applications, allowing computers to do tasks that only the most powerful mainframes could handle in the past.

On top of that has been a more recent proliferation of devices that want to get onto and into networks. And to top it off, the internet has made it possible to move more information faster to more places and more people in an instant.

Some of the enterprise sectors facing major IT complexity problems are financial services (just think of the complexity of a single ATM network, Ganek notes), governments, telecommunications and huge website operators, he says.

But this isn't just a big company problem. As Ganek notes, even the smallest companies can buy a server or two and make use of software that - while it may be a stripped down version of the programs that large enterprises are running - is still enormously complex.

And while an enterprise is likely to have a full IT department to manage its software and hardware, a small or medium-sized company may only have one or two people.

Hence, he argues, people need simplified ways of dealing with that complexity and need as much of the complexity as possible managed out of sight: automated.

That, in turn, requires software and hardware to mesh properly, applications to work together, and common, well-understood system architectures.

"We have to have a very co-ordinated approach if we're going to deal with this," says Ganek. "There is complexity in all aspects of a computer, and there isn't any feasible way to put a veneer over the system. We need overarching standards and we need co-operation."

That's already beginning to happen. Last year, more than 33 international conferences were held on the subject of autonomic computing, and it has long since moved from being a project managed - or even driven by IBM - to an international challenge, although the company takes a strong interest in supporting ongoing efforts and contributing to the process, Ganek says.

So how is all of this supposed to work? First off, you could have more elements of a computer system become more self defining. So, if you add a new server to a network, the server knows how to tell the network that it is there, and the network recognises it and incorporates it into the system.

Many researchers are working on ways that systems and networks could self-diagnose problems and then self-heal.

One approach is to have error logs self-analyse. If a problem is detected, a network could search for other times a similar incident has occurred by looking at old error logs, and also see how the problem was fixed at that time, and apply that solution (although a big challenge is that error logs are generated in different ways by different applications, so they need to be translated into a common 'language').

Autonomic computing should be able to take control of many of the processes that are repeated over and over, even something as simple as automating a password reset process for the help desk of a company whose main plea for help came from people forgetting their passwords.

Ganek says IBM enabled a company to do this, at a saving of about $1 million (€790,000) a year.

General network management issues are also part of the brief for autonomic computing, which could handle problems such as outages. But Ganek says: "For most customers, the issue isn't unexpected outages, but problems that happen when you need to change something on the network."

This is why a self-defining network and self-defining devices are part of the bigger picture, he says.

A technician cannot always see very clearly why errors are being generated - for example, why an application may continue to run smoothly and error logs will not indicate that, say, a server is down, causing a problem. A server that talks to the network and vice versa can avoid such a problem.

The purpose of conferences such as the one at UCD is to get the industry to examine how to make all these elements work together, decide on standards, and work towards that big picture, says Ganek.

Looking forward to his closing session, he noted impishly that all the presenters so far had discussed scenarios where a system is already up and running.

He planned to ask them to consider how they would set up the system in the first place, and get everything to work together.

"They all assume the system is up and running and can easily forget about that big picture," he laughs.