Async RL training emerges as dominant paradigm
Several open-source libraries have converged on disaggregating inference from training onto separate GPU pools, connecting them with a rollout buffer, and letting both sides run concurrently. A survey of 16 libraries compared them across seven axes, including orchestration primitives and buffer design. TRL is developing a new async trainer, guided by this survey.
Async RL training requires significant changes to existing codebases, including disaggregating inference from training and implementing rollout buffers. Developers can leverage open-source libraries and design principles from the survey to implement async training.
Try implementing async training in a small-scale RL project using TRL's new async trainer, focusing on overlapping generation with training to improve GPU utilization.
Tags
Signals by role
Tools mentioned