To work in a wide range of real-world conditions, robots need to learn generalist policies. To that end, researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory, or MIT CSAIL, have created a Real-to-Sim-to-Real model.
The goal of many developers is to create hardware and software so that robots can work everywhere under all conditions. However, a robot that operates in one person’s home doesn’t need to know how to operate in all of the neighboring homes.
MIT CSAIL’s team chose to focus on RialTo, a method to easily train robot policies for specific environments. The researchers said it improved policies by 67% over imitation learning with the same number of demonstrations.
It taught the system to perform everyday tasks, such as opening a toaster, placing a book on a shelf, putting a plate on a rack, placing a mug on a shelf, opening a drawer, and opening a cabinet.
“We aim for robots to perform exceptionally well under disturbances, distractions, varying lighting conditions, and changes in object poses, all within a single environment,” said Marcel Torne Villasevil, MIT CSAIL research assistant in the Improbable AI lab and lead author on a new paper about the work.
“We propose a method to create digital twins on the fly using the latest advances in computer vision,” he explained. “With just their phones, anyone can capture a digital replica of the real world, and the robots can train in a simulated environment much faster than the real world, thanks to GPU parallelization. Our approach eliminates the need for extensive reward engineering by leveraging a few real-world demonstrations to jumpstart the training process.”
RialTo builds policies from reconstructed scenes
Torne’s vision is exciting, but RialTo is more complicated than just waving your phone and having a home robot on call. First, the user uses their device to scan the chosen environment with tools like NeRFStudio, ARCode, or Polycam.
Once the scene is reconstructed, users can upload it to RialTo’s interface to make detailed adjustments, add necessary joints to the robots, and more.
Next, the redefined scene is exported and brought into the simulator. Here, the goal is to create a policy based on real-world actions and observations. These real-world demonstrations are replicated in the simulation, providing some valuable data for reinforcement learning (RL).
“This helps in creating a strong policy that works well in both the simulation and the real world,” said Torne. “An enhanced algorithm using reinforcement learning helps guide this process, to ensure the policy is effective when applied outside of the simulator.”
Researchers test model’s performance
In testing, MIT CSAIL found that RialTo created strong policies for a variety of tasks, whether in controlled lab settings or in more unpredictable real-world environments. For each task, the researchers tested the system’s performance under three increasing levels of difficulty: randomizing object poses, adding visual distractors, and applying physical disturbances during task executions.
“To deploy robots in the real world, researchers have traditionally relied on methods such as imitation learning from expert data which can be expensive, or reinforcement learning, which can be unsafe,” said Zoey Chen, a computer science Ph.D. student at the University of Washington who wasn’t involved in the paper. “RialTo directly addresses both the safety constraints of real-world RL, and efficient data constraints for data-driven learning methods, with its novel real-to-sim-to-real pipeline.”
“This novel pipeline not only ensures safe and robust training in simulation before real-world deployment, but also significantly improves the efficiency of data collection,” she added. “RialTo has the potential to significantly scale up robot learning and allows robots to adapt to complex real-world scenarios much more effectively.”
When paired with real-world data, the system outperformed traditional imitation-learning methods, especially in situations with lots of visual distractions or physical disruptions, the researchers said.
MIT CSAIL continues work on robot training
While the results so far are promising, RialTo isn’t without limitations. Currently, the system takes three days to be fully trained. To speed this up, the team hopes to improve the underlying algorithms using foundation models.
Training in simulation also has limitations. Sim-to-real transfer and simulating deformable objects or liquids are still difficult. The MIT CSAIL team said it plans to build on previous efforts by working on preserving robustness against various disturbances while improving the model’s adaptability to new environments.
“Our next endeavor is this approach to using pre-trained models, accelerating the learning process, minimizing human input, and achieving broader generalization capabilities,” said Torne.
Torne wrote the paper alongside senior authors Abhishek Gupta, assistant professor at the University of Washington, and Pulkit Agrawal, an assistant professor in the department of Electrical Engineering and Computer Science (EECS) at MIT.
Four other CSAIL members within that lab are also credited: EECS Ph.D. student Anthony Simeonov SM ’22, research assistant Zechu Li, undergraduate student April Chan, and Tao Chen Ph.D. ’24. This work was supported, in part, by the Sony Research Award, the U.S. government, and Hyundai Motor Co., with assistance from the WEIRD (Washington Embodied Intelligence and Robotics Development) Lab.