As e-commerce orders arrive, a warehouse robot picks up cups from a shelf and places them in boxes to send. Everything sounds, until the warehouse processes a change and now the robot has to grab taller, narrower cups that are stored upside down.
Reprogramming this robot involves hand-tagging thousands of images that show you how to grab these new cups and then retrain the system.
But a new technique developed by MIT researchers would only require a handful of human demonstrations to reprogram the robot. This machine learning method allows a robot to grab and place never-before-seen objects in random positions that have never been found. In 10 to 15 minutes, the robot would be ready to perform a new task of collection and placement.
The technique uses a neural network specially designed to reconstruct the shapes of objects in 3D. With just a few demonstrations, the system uses what the neural network has learned about 3D geometry to pick up new objects similar to those in the demonstrations.
In simulations and using a real robotic arm, the researchers show that their system can efficiently manipulate unseen cups, bowls, and bottles, arranged in random positions, using only 10 demonstrations to teach the robot.
“Our main contribution is the general ability to offer new skills much more efficiently to robots that need to operate in less structured environments where there could be a lot of variability. The concept of generalization by construction is a fascinating skill because this problem is often very more difficult, “says Anthony Simeonov, a graduate student in Electrical and Computer Engineering (EECS) and co-author of the article.
Simeonov wrote the article with lead co-author Yilun Du, an EECS graduate student; Andrea Tagliasacchi, Google Brain Scientific Researcher; Joshua B. Tenenbaum, Paul E. Newton Professor of Cognitive and Computer Science Career Development in the Department of Brain and Cognitive Sciences and member of the Laboratory of Computer Science and Artificial Intelligence (CSAIL); Alberto Rodríguez, Associate Professor of the 1957 class in the Department of Mechanical Engineering; and lead authors Pulkit Agrawal, professor at CSAIL, and Vincent Sitzmann, assistant professor entering EECS. The research will be presented at the International Conference on Robotics and Automation.
Enter the geometry
A robot can be trained to pick up a specific object, but if that object is lying on its side (it may have fallen), the robot sees this as a whole new scenario. This is one of the reasons why it is so difficult for machine learning systems to generalize to new object orientations.
To overcome this challenge, the researchers created a new type of neural network model, a neural descriptor field (NDF), that learns the 3D geometry of a class of elements. The model calculates the geometric representation of a specific element using a 3D point cloud, which is a collection of three-dimensional data points or coordinates. Data points can be obtained from an in-depth camera that provides information about the distance between the object and a viewpoint. Although the network was trained in simulation in a large set of synthetic 3D shapes data, it can be applied directly to real-world objects.
The team designed the NDF with a property known as equivalence. With this property, if the model sees an image of a cup in a vertical position, and then shows an image of the same cup next to it, he understands that the second cup is the same object, just rotated.
“This equivalence is what allows us to handle cases where the object you are observing is in an arbitrary orientation much more efficiently,” says Simeonov.
As the NDF learns to reconstruct the shapes of similar objects, it also learns to associate related parts of these objects. For example, learn that the handles of the cups are similar, although some cups are taller or wider than others, or have shorter or longer handles.
“If you wanted to do it with a different approach, you would have to label all the pieces by hand. Instead, our approach automatically discovers these parts from the reconstruction of the shape,” says Du.
Researchers use this trained NDF model to teach a robot a new skill with just a few physical examples. Move the robot’s hand over the part of an object they want to grab, such as the edge of a bowl or the handle of a cup, and record the locations of your fingertips.
Because the NDF has learned so much about 3D geometry and how to reconstruct shapes, it can infer the structure of a new shape, allowing the system to transfer demonstrations to new objects in arbitrary positions, explains Of.
Choose a winner
They tested their model in simulations and on a real robotic arm using cups, bowls and bottles as objects. His method had an 85% success rate in selecting and placing new objects in new directions, while the best baseline could only achieve a 45% success rate. Success means grabbing a new object and placing it in a target place, such as hanging cups on a rack.
Many baselines use 2D image information instead of 3D geometry, making it difficult to incorporate equivalence using these methods. This is one of the reasons why the NDF technique worked so much better.
Although the researchers were happy with their performance, their method only works for the particular category of objects in which they are trained. A robot learned to collect cups will not be able to pick up boxes or headphones, because these objects have geometric characteristics that are too different from those on which the net was trained.
“In the future, extending it to many categories or completely abandoning the notion of category would be ideal,” says Simeonov.
They also plan to adapt the system to non-rigid objects and, in the long run, allow the system to perform collection and placement tasks when the target area changes.
This work is supported in part by the Defense Advanced Research Projects Agency, the Singapore Defense Science and Technology Agency and the National Science Foundation.