Faced with a baby screaming the house down and throwing food on the floor, frazzled parents may be surprised to hear that their beloved offspring is probably the smartest learner in the known universe.
But some computer scientists have long recognised that reality and are now trying to mimic babies’ extraordinary processing powers to develop artificial intelligence models. In some respects, our latest technological creations appear near-magical in their capabilities, but when it comes to autonomous learning, they are as dumb as a diaper. Can they be trained to learn as baby processing units do by exploring partial, messy, real-world data?
A team of researchers at New York University has been trying to do just that and this month published their findings in the journal Science. Their experiment drew data from a lightweight camera attached to the head of a baby in Adelaide called Sam, recording 61 hours of his life from the age of 6 to 25 months.
The video stream, including jumbled-up images and sounds of parents, cats, play and toys, was then processed into 600,000 video frames and 37,500 transcribed “utterances” and fed into a neural network. The challenge was to match what Sam saw during approximately 1 per cent of his waking hours with the sounds he heard, to create a multimodal AI model.
Cutting off family members: ‘It had never occurred to me that you could grieve somebody who was still alive’
Bluesky is being touted as ‘Twitter before Elon Musk’. Can the good vibes last?
David McWilliams: The potential threats to Ireland now come in four guises
‘I know what happened in that room’: the full story of the Conor McGregor case
[ ‘It is going to have a huge impact’: What does the march of AI mean for our jobs?Opens in new window ]
But how does a baby understand that the word “ball” relates to very different types of round, bouncy, multicoloured objects? Cognitive scientists are divided on the explanation but all agree that babies are amazingly adept learners, generalising from limited inputs. Between six and nine months, babies begin to connect words with images. Before the age of two, they have learned an average of 300 words, mostly nouns.
Until now, attempts to build multimodal AI models that can combine text, images, audio and video have mostly relied on the application of large computing power to vast amounts of curated data. But the NYU researchers found their model could successfully associate images and sounds with substantially less data from one baby’s video feed. Their model had an accuracy rate of 61.6 per cent when it came to classifying 22 “visual concepts”.
“We were very surprised that the model could exhibit a pretty remarkable degree of learning given the limited data it had,” Wai Keen Vong, the lead author of the NYU paper, told me in a video interview.
The success of the models, which is still very limited, may take advantage of the exploration and social learning capacities of babies, but that doesn’t mean that they themselves have those abilities
— Alison Gopnik
These findings are an encouraging prompt for the development of future AI models. But, as Vong said, they also underscore the phenomenal learning abilities of babies themselves, who can respond to visual signals and develop their own learning hypotheses. Part of the reason for their precociousness is that human babies spend an uncommonly long time actively exploring the world before they have to fend for themselves. “Children are the R&D department of the human species — the blue-sky guys, the brainstormers. Adults are production and marketing,” as Alison Gopnik memorably wrote in her book The Philosophical Baby.
According to Gopnik, a psychology professor at University of California, Berkeley, babies have three core skills that AI systems lack. First, babies excel at imaginative model building, creating a conceptual framework to explain the world. They are also curious, adventure-loving and embodied learners, actively exploring new environments, rather than being passively encased in lines of code. And babies are social animals, learning from all those they interact with, helping develop empathy, altruism and a moral sensibility.
In an email, Gopnik says that the “fascinating and very clever” NYU study shows that AI models can extract linguistic information from the data that babies experience. But, as the paper’s authors acknowledge, babies also use different data, which they gain from active exploration and social interaction. “The success of the models, which is still very limited, may take advantage of the exploration and social learning capacities of babies, but that doesn’t mean that they themselves have those abilities,” Gopnik writes.
It will take a lot more research to replicate computationally what babies learn naturally. In particular, how can we build machines that exhibit common sense and social reasoning? AI models may be able to learn nouns associated with physical objects, but they still struggle with abstract concepts and verbs. In spite of the stunning advances in AI, we still have much to learn from the tiny bag of biological wetware that is a baby’s brain. – © 2024 The Financial Times