Robots That See The Way We Do

When you look down your street, you may notice that "your eye is drawn to something." This expression is very apt. Humans do not process the detail of entire images the way a computer does today. Humans rapidly move their eyes around ("saccade") to spot items that are of interest. Our focus is "drawn" to those areas. This is efficient and fast. Yet, most computer-based image processing systems do not work that way -- they process the entire image at equal resolution, as if “every pixel counts”. But our experience tells us the opposite: we work better by deeply focusing our attention and vision in a restricted portion of the image, carrying out efficiently one task at a time.

To illustrate how the human brain and eye work, please try an experiment. As you are sitting as a passenger in a car driving down the street, keep your eyes straight ahead. Notice a stop sign in your peripheral vision, but do not move your eyes. Can you read the letters STOP? Probably not. That is because the brain is not expending energy processing the detail from that part of your eye. To read it, quickly glance at the sign and look ahead. You will notice that even with the glance, your eyes will see the letters and the brain will read them.

Neurala, with the NSF-sponsored CELEST Neuromorphics Lab at Boston University, is working on a brain-inspired neural model controlling an active visual system that saccades or moves a robot’s camera eyes to create a new type of efficient object detection and vision system. The goal is to make image processing more efficient and identification of critical objects faster. The work is being partially funded by NASA for planetary exploration, where processing and battery efficiency are critical, and by the Air Force Research Lab, where efficient sensemaking is key to interpret large volumes of real-time data. But, it has general application to all types of vision systems.

The following videos demonstrate how the system works:

The first video demonstrates how a human eye works. It shows a chameleon and other animals in a natural habitat. (The video is from a BBC wildlife series and the eye movements tracks are courtesy of the DIEM project.) The highlight in the image illustrates where the human eye is drawn. As additional information is seen, the eye jumps around. It may look at one thing, jump to something else that could be important and then jump back to the original item. This jumping process allows a human to focus on more than one object at a time and quickly understand a visual scene.

For space exploration, or even on Earth, similar mechanisms can be used to emulate the extremely efficient ways humans make sense of the visual world. The next video shows how the process works on a simulated Mars environment.

The white dots show how the robot eye spots items of interest in the entire frame. Note how the eye saccades or jumps from interesting point to interesting point. When an object (in this case, a rock) is successfully learned or classified, it is “blasted” with a white blob. This mechanism is much more efficient than trying to process an entire image, using less processing power and battery life, while providing much more information than a static method (e.g., position of the object relative to others). The video then shows how the process would work in an actual Mars setting.

[video width="640" height="360" mp4=""][/video]

How would this work in an actual robot down on Earth? The third video shows the process working on a test robot. The round ball on the top of the robot is the camera that saccades. The image on the top right shows what the robot sees as it looks around the room. Notice how it spots important objects and then returns to looking at where it is going.

[video width="640" height="480" mp4=""][/video]

Neurala continues its work to make the process more efficient so that it can lower the cost of robots by using brain-like processes.