Computer Vision: Recognize objects faster and more accurately with CNNs

20 juillet 2022

Despite constant movements of the body, head or eyes, our visual perception of the objects around us remains stable even though the physical information hitting our retinas is constantly changing. Scientists at the RIKEN Institute in Japan have looked at all the unnoticed eye movements we make and have shown that they allow us to recognize objects in a stable way. These findings can be applied to computer vision and be particularly useful for autonomous driving systems. They published their study entitled ” Motor-related signals support localization invariance for stable visual perception”in the scientific journal PLOS Computational Biology.

RIKEN, the largest global research institution in Japan, is internationally recognized for its high-quality research in a wide range of scientific disciplines. For the brain sciences, Andrea Benucci is the director of the Neural Circuits and Behavior Laboratory and the author of the paper.

He explains:

“Our lab studies the neural basis of sensory processing with a particular focus on vision. We are particularly interested in understanding the computational rules used by populations of neurons in the visual cortex to process visual information: how does the coordinated activity of groups of neurons, ‘talking’ to each other via action potentials, produce a visual percept? What are the relevant spatial and temporal scales used to process visual information? To answer these questions, we use as a model system the primary visual cortex of mice trained in behavioral tasks. The experimental tools we use are based on state-of-the-art methods in optogenetics, optical imaging and electrode recording. “

The research

Our ability to perceive a stable visual world in the presence of continuous body, head, and eye movements has long intrigued neuroscience researchers. The various investigations of perceptual stability have revealed a multiplicity of computational and physiological phenomena that operate on multiple spatiotemporal scales and brain regions. Neural copies of motion commands, sent throughout the brain each time we move, could allow the brain to account for our own movements and maintain our perception stable.

In addition to this stable perception, eye movements and their motor copies could also help us stably recognize objects in the world, but how this happens remains a mystery.

The convolutional neural network

Andrea Benucci and her team designed a CNN with architectures inspired by the hierarchical signal processing of the mammalian visual system to optimize the classification of objects in a visual scene during movements.

To begin, the CNN was trained to classify 60,000 black-and-white images into 10 categories, which it did successfully. But when it was tested with shifted images that mimic the naturally altered visual input during eye movements, performance dropped dramatically to chance. The researchers were able to solve this problem by training it with shifted images, including the direction and size of the eye movements that caused the shift. Thus, adding the eye movements and their motor copies to the network model allowed the system to better handle the visual noise in the images.

Andrea Benucci states:

“This advance will help avoid dangerous errors in computer vision. With more efficient and robust computer vision, it is less likely that pixel alterations, also known as ‘adversarial attacks,’ will cause, for example, autonomous cars to label a stop sign as a streetlight, or military drones to misclassify a hospital building as an enemy target. “

Bringing these results to real-world computer vision would be possible according to Andrea Benucci, who explains:

“the benefits of mimicking eye movements and their efferent copies involve ‘forcing’ a computer vision sensor to have controlled types of movements, while informing the vision network in charge of processing the associated images of the self-generated movements , would make computer vision more robust and similar to what is experienced in human vision. “

This research will continue in collaboration with colleagues of Andrea Bonucci working with neuromorphic technologies. The idea is to implement real silicon-based circuits based on the principles highlighted in this study and test whether they improve computer vision capabilities in real-world applications.

Article source:

Benucci A (2022) Motor-related signals support location invariance for stable visual perception.

PLoS Comput Biol . doi: 10.1371/journal.pcbi.1009928

Translated from Vision par ordinateur : Reconnaître les objets plus rapidement et plus précisément grâce aux CNN