Google DeepMind AI can imagine a 3D model from a 2D image

June 29, 2018

One of the difficulties when it comes to creating visual recognition systems for an AI is to program what the human brain does effortlessly. Specifically, when a person enters an unfamiliar area, it's easy to recognize and categorize what's there. Our brains are designed to automatically take it in at a glance, make inferences based on prior knowledge and see it from a different angle or recreate it in our heads. The team at Google's DeepMind are working on a neural network that can do similar things.

The problem is that to be able to train an AI to make these sort of inferences, researchers have to input tremendous amounts of carefully labeled data. And neural networks often have trouble applying the lessons learned from one scene to another. The key, then, was creating a neural network that could understand its surroundings.

Enter the General Query Network (GQN) from DeepMind. This neural network differs from others because it is programmed to observe its surroundings and train only on that data -- not on data inputted by researchers. As a result, GQN is learning to make sense of the world and applying those observations to new scenes it encounters.

After exposing the GQN to controlled environments, researchers exposed it to randomly generated ones. It was able to imagine the scene from different angles and create a three-dimensional rendering of a 2D image. It also was able to identify and classify objects without pre-inputted labels on what they were as well as make inferences based on what it can see to figure out what it can't.

Findings were published in the journal Science, but you can read the full PDF here. The researchers note that there are, of course, some limitations to GQN. So far, it's only been trained on synthetic scenes; it's unclear how it would do with real-world images. The blog post notes, "While there is still much more research to be done before our approach is ready to be deployed in practice, we believe this work is a sizeable step towards fully autonomous scene understanding."

Via: Ars Technica

Source: Deepmind Blog, Deepmind (PDF)

via Engadget RSS Feed https://ift.tt/2lH8vkQ