Lessons learned from the SLIM project

What is SLIM?
SLIM is a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. Some deployment demos of SLIM are below:
For more details, please refer to our RA-L paper.
Lessons
Policy behavior
PID control is critical for manipulation when using low-cost robotic arms.
Maintaining a small action delta for RL is important for policy stability; however, the proper method for training that small action delta is even more important.
An under-constrained simulation task generates behaviors that do not transfer well to the real world. This is a matter of reward engineering.
Sometimes a simple prior can significantly help regulate behavior and speed up training without sacrificing generality.
The most efficient behavior in a simulation is usually not robust in a real-world environment.
Inference speed is important, and using a small model is essential. While speed can be compensated for by high-end hardware, efficiency remains key.
Handling latency is tricky, especially regarding model inference latency.
Scene and task randomization are crucial for policy robustness, which is difficult to achieve with pure imitation learning.
Sim2Real
Extensive domain randomization is the key to success.
Handling real-world lighting is more difficult than expected.
Simulation performance is not a reliable indicator of real-world performance.
Improvement in simulation performance is not always proportional to improvement in real-world performance.
Hardware
For your first real-world robot, always choose established hardware with a large user base.
Prioritize hardware safety features, such as torque-off functions and breakable camera mounts.
Regularly checking the hardware’s condition—including screws, cables, ports, and indicator lights—can eliminate model performance confusion.
Always prepare plenty of spare hardware parts.
Be open-minded when designing a hardware system; appropriate hardware selection makes the learning process much easier.
Maximizing a robot’s generality requires minimizing hardware-specific assumptions.
For Sim2Real, hardware selection is also dictated by the capabilities of your simulation.
Debugging
Ensure the visualization of all intermediate results.
Run comparable checkpoints on the hardware on the same day and under the same environmental conditions.
Maintain strict version management for code corresponding to different checkpoints.
Logging individual subtask metrics is helpful for predicting the overall outcome behavior. A single metric can be easily hacked by an RL policy.
General
Verify an easier subsystem as a first step (e.g., Tabletop -> Box Gripper -> SLIM).
The real world is a much more brutal evaluator than a friendly digital world.
Focus on the specific novel component to be tackled and reuse existing results as much as possible.
Working with real robots usually takes twice as much time as planned.
Conducting serious real-world evaluation is difficult.
A large robotic system requires strong software engineering.
Anyone who wants to build a serious robot must understand its hardware embodiment well. As the saying goes: “People who are really serious about software should make their own hardware.”