Lessons learned from the SLIM project

What is SLIM?
SLIM is a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. Some deployment demos of SLIM are below:
For more details, please refer to our paper.
Lessons
Policy behavior
PID is critical for manipulation with cheap arms.
Small action delta is important; proper way of training a small action delta is even more important.
An under-constrained simulation task generates model behaviors that don’t transfer well. Rewarding engineering.
Sometimes a simple prior can help a lot in regulating behaviors and speed up training, without sacrificing generality.
Most efficient behavior in sim is not usually robust in real.
Inference speed is important. Small model is important. Speed can be compensated by high-end hardware.
Handling latency is tricky, especially model inference latency.
Scene&task randomization is crucial for policy robustness, which is difficult for pure imitation learning.
Sim2Real
Lots of domain randomization is the key.
Handling real-world lighting is tricker than expected.
Simulation performance is not an indicator of real performance.
Sim performance is not always in proportional to real performance.
Hardware
For the first real-world robot, always first choose established hardwares with a good user base.
Hardware safety (torque off, breakable camera mount, etc)
Regular checking of hardware condition can eliminate performance confusions (screws, cables, ports, indicator lights, etc)
Prepare lots of spare hardware parts!
Be open-minded when designing a hardware system. Appropriate hardware selection makes the learning much easier.
Maximizing the generality of the robot requires minimizing the hardware assumptions.
Hardware selection is also decided by simulation ability.
Debugging
Visualization of intermediate results
Run comparable checkpoints on the hardware on the same day and under the same environment condition.
Ckpt code version management
Logging individual subtask metrics is helpful for predicting overall training outcome.
General
Verify an easier subsystem as the first step (tabletop -> box_gripper -> SLIM)
Real world is a much more brutal evaluator than a friendly digital world.
Focus on the novel component to be tackled and reuse others’ results as much as possible.
Working with real robots usually takes time twice as planned.
Serious real-world evaluation is difficult.
A large robotic system requires strong software engineering.
Anyone who wants to build a serious robot must know well about its hardware embodiment. “People who are really serious about software should make their own hardware.”