How many parks are near the new home you’re thinking of buying? What’s the best dinner-wine pairing at a restaurant? These everyday questions require relational reasoning, an important component of higher thought that has been difficult for artificial intelligence (AI) to master. Now, researchers at Google’s DeepMind have developed a simple algorithm to handle such reasoning—and it has already beaten humans at a complex image comprehension test.
Humans are generally pretty good at relational reasoning, a kind of thinking that uses logic to connect and compare places, sequences, and other entities. But the two main types of AI—statistical and symbolic—have been slow to develop similar capacities. Statistical AI, or machine learning, is great at pattern recognition, but not at using logic. And symbolic AI can reason about relationships using predetermined rules, but it’s not great at learning on the fly.
The new study proposes a way to bridge the gap: an artificial neural network for relational reasoning. Similar to the way neurons are connected in the brain, neural nets stitch together tiny programs that collaboratively find patterns in data. They can have specialized architectures for processing images, parsing language, or even learning games. In this case, the new “relation network” is wired to compare every pair of objects in a scenario individually. “We’re explicitly forcing the network to discover the relationships that exist between the objects,” says Timothy Lillicrap, a computer scientist at DeepMind in London who co-authored the paper.
He and his team challenged their relation network with several tasks. The first was to answer questions about relationships between objects in a single image, such as cubes, balls, and cylinders. For example: “There is an object in front of the blue thing; does it have the same shape as the tiny cyan thing that is to the right of the gray metal ball?” For this task, the relation network was combined with two other types of neural nets: one for recognizing objects in the image, and one for interpreting the question. Over many images and questions, other machine-learning algorithms were right 42% to 77% of the time. Humans scored a respectable 92%. The new relation network combo was correct 96% of the time, a superhuman score, the researchers report in a paper posted last week on the preprint server arXiv.
The DeepMind team also tried its neural net on a language-based task, in which it received sets of statements such as, “Sandra picked up the football” and “Sandra went to the office.” These were followed by questions like: “Where is the football?” (the office). It performed about as well as its competing AI algorithms on most types of questions, but it really shined on so-called inference questions: “Lily is a Swan. Lily is white. Greg is a swan. What color is Greg?” (white). On those questions, the relation network scored 98%, whereas its competitors each scored about 45%. Finally, the algorithm analyzed animations in which 10 balls bounced around, some connected by invisible springs or rods. Using the patterns of motion alone, it was able to identify more than 90% of the connections. It then used the same training to identify human forms represented by nothing more than moving dots.
“One of the strengths of their approach is that it’s conceptually quite simple,” says Kate Saenko, a computer scientist at Boston University who was not involved in the new work but has also just co-developed an algorithm that can answer complex questions about images. That simplicity—Lillicrap says most of the advance is captured in a single equation—allows it to be combined with other networks, as it was in the object comparison task. The paper calls it “a simple plug-and-play module” that allows other parts of the system to focus on what they’re good at.
“I was pretty impressed by the results,” says Justin Johnson, a computer scientist at Stanford University in Palo Alto, California, who co-developed the object comparison task—and also co-developed an algorithm that does well on it. Saenko adds that a relation network could one day help study social networks, analyze surveillance footage, or guide autonomous cars through traffic.
To approach humanlike flexibility, though, it will have to learn to answer more challenging questions, Johnson says. Doing so might require comparing not just pairs of things, but triplets, pairs of pairs, or only some pairs in a larger set (for efficiency). “I’m interested in moving toward models that come up with their own strategy,” he says. “DeepMind is modeling a particular type of reasoning and not really going after more general relational reasoning. But it is still a superimportant step in the right direction.”