MIT CSAIL Develops AI to Help Household Robots Halve Planning Time

PIGINet is a model that shortens the iterative process of a robot learning how best conduct a task.

Alex Shipps, MIT CSAIL


PIGINet predicts the feasibility of a task plan given images of objects, goal description, and initial state descriptions. It can reduce the planning time of a task and motion planner by 50% to 80%.
MIT CSAIL said its PIGINet uses machine learning to streamline household robots' task and motion planning by assessing and filtering feasible solutions in complex environments.

Your brand-new household robot is delivered to your house, and you ask it to make you a cup of coffee. Although it knows some basic skills from previous practice in simulated kitchens, there are way too many actions it could possibly take—turning on the faucet, flushing the toilet, emptying out the flour container, and so on. But there’s a tiny number of actions that could possibly be useful. How is the robot to figure out what steps are sensible in a new situation?

It could use PIGINet, a new system that aims to efficiently enhance the problem-solving capabilities of household robots. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are using machine learning to cut down on the typical iterative process of task planning that considers all possible actions. PIGINet eliminates task plans that can’t satisfy collision-free requirements, and it reduces planning time by 50% to 80% when trained on only 300 to 500 problems. 

Typically, robots attempt various task plans and iteratively refine their moves until they find a feasible solution, which can be inefficient and time-consuming, especially when there are movable and articulated obstacles. Maybe after cooking, for example, you want to put all the sauces in the cabinet.

That problem might take two to eight steps, depending on what the world looks like at that moment. Does the robot need to open multiple cabinet doors, or are there any obstacles inside the cabinet that need to be relocated in order to make space? You don’t want your robot to be annoyingly slow — and it will be worse if it burns dinner while it’s thinking.

MIT CSAIL kitchen AI schematic

PIGINet uses images to decide which sequence of actions are feasible in rearrangement tasks involving storage space and cluttered surface. Source: MIT CSAIL

PIGINet avoids recipes for motion planning

Household robots are usually thought of as following predefined recipes for performing tasks, which isn’t always suitable for diverse or changing environments. So, how does PIGINet avoid those predefined rules? PIGINet is a neural network that takes in “Plans, Images, Goal, and Initial facts,” then predicts the probability that a task plan can be refined to find feasible motion plans.

In simple terms, it employs a transformer encoder, a versatile and state-of-the-art model designed to operate on data sequences. The input sequence, in this case, is information about which task plan it is considering, images of the environment, and symbolic encodings of the initial state and the desired goal. The encoder combines the task plans, image, and text to generate a prediction regarding the feasibility of the selected task plan. 

Keeping things in the kitchen, the team created hundreds of simulated environments, each with different layouts and specific tasks that require objects to be rearranged among counters, fridges, cabinets, sinks, and cooking pots. By measuring the time taken to solve problems, they compared PIGINet against prior approaches.

One correct task plan may include opening the left fridge door, removing a pot lid, moving the cabbage from pot to fridge, moving a potato to the fridge, picking up the bottle from the sink, placing the bottle in the sink, picking up the tomato, or placing the tomato. PIGINet significantly reduced planning time by 80% in simpler scenarios and 20% to 50% in more complex scenarios that have longer plan sequences and less training data.

“Systems such as PIGINet, which use the power of data-driven methods to handle familiar cases efficiently, but can still fall back on 'first-principles' planning methods to verify learning-based suggestions and solve novel problems, offer the best of both worlds, providing reliable and efficient general-purpose solutions to a wide variety of problems,” said Leslie Pack Kaelbling, an MIT professor and CSAIL principal investigator.

Data needed for good decisions

PIGINet's use of multimodal embeddings in the input sequence allowed for better representation and understanding of complex geometric relationships. Using image data helped the model to grasp spatial arrangements and object configurations without knowing the object 3D meshes for precise collision checking, enabling fast decision-making in different environments. 

One of the major challenges faced during the development of PIGINet was the scarcity of good training data, as all feasible and infeasible plans need to be generated by traditional planners, which is slow in the first place. However, by using pretrained vision language models and data augmentation tricks, the team was able to address this challenge, showing impressive plan time reduction not only on problems with seen objects, but also zero-shot generalization to previously unseen objects.

“Because everyone’s home is different, robots should be adaptable problem-solvers instead of just recipe followers,” said Zhutian Yang, MIT CSAIL Ph.D. student and lead author on the work. “Our key idea is to let a general-purpose task planner generate candidate task plans and use a deep learning model to select the promising ones.”

“The result is a more efficient, adaptable, and practical household robot, one that can nimbly navigate even complex and dynamic environments,” she added. “Moreover, the practical applications of PIGINet are not confined to households,”

“Our future aim is to further refine PIGINet to suggest alternate task plans after identifying infeasible actions, which will further speed up the generation of feasible task plans without the need of big datasets for training a general-purpose planner from scratch,” Yang said. “We believe that this could revolutionize the way robots are trained during development and then applied to everyone’s homes.”

PIGINet architecture

PIGINet is a Transformer-based architecture that fuses features of images, text, and values describing the problem and plan. Click on image to enlarge. Source: MIT CSAIL

Paper addresses general-purpose robotics challenge

“This paper addresses the fundamental challenge in implementing a general-purpose robot: how to learn from past experience to speed up the decision-making process in unstructured environments filled with a large number of articulated and movable obstacles,” said Beomjoon Kim, Ph.D. ’20, assistant professor in the Graduate School of AI at Korea Advanced Institute of Science and Technology (KAIST).

“The core bottleneck in such problems is how to determine a high-level task plan such that there exists a low-level motion plan that realizes the high-level plan,” he wrote. “Typically, you have to oscillate between motion and task planning, which causes significant computational inefficiency. Zhutian's work tackles this by using learning to eliminate infeasible task plans, and is a step in a promising direction.”

Yang wrote the paper with NVIDIA research scientist Caelan Garrett, SB ’15, MEng ’15, Ph.D. ’21; Tomás Lozano-Pérez and Kaelbling, professors in the MIT Department of Electrical Engineering and Computer Science (EECS) and CSAIL members; and Dieter Fox, senior director of robotics research at NVIDIA and a professor at the University of Washington.

The team was supported by AI Singapore and grants from National Science Foundation, the U.S. Air Force Office of Scientific Research, and the U.S. Army Research Office. This project was partially conducted while Yang was an intern at NVIDIA Research. Their research was presented at the Robotics: Science and Systems conference this month.

Rachel Gordon, MIT CSAIL

About the author

Rachel Gordon is communications manager at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory. This article reposted with permission.

PIGINet: Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning

Email Sign Up

Get news, papers, media and research delivered
Stay up-to-date with news and resources you need to do your job. Research industry trends, compare companies and get market intelligence every week with Robotics 24/7. Subscribe to our robotics user email newsletter and we'll keep you informed and up-to-date.

Alex Shipps, MIT CSAIL

PIGINet predicts the feasibility of a task plan given images of objects, goal description, and initial state descriptions. It can reduce the planning time of a task and motion planner by 50% to 80%.


Robot Technologies