Centralized-to-Decentralized Policy Distillation for Efficient Multi-Agent Reinforcement Learning
- Speaker(s)
- Maciej Wojtala
- Affiliation
- University of Warsaw
- Language of the talk
- English
- Date
- May 28, 2026, 12:30 p.m.
- Room
- room 4060
- Seminar
- Seminar Algorithmic Economics
Centralized Training with Decentralized Execution (CTDE) is the dominant paradigm in multi-agent reinforcement learning (MARL), enabling agents to act independently at test time while leveraging additional information during training. However, the most prominent methods within CTDE, based on value decomposition, are limited in learning efficiency and final performance by partial observability in both training and execution.
In this work, we propose a simple yet highly effective framework that lifts this restriction during training by allowing agents to share information in the latent space, thereby combining the full history of observations and actions. It produces better-informed behavioral policies that train faster and achieve stronger performance. To recover decentralized execution, we concurrently distill these policies into counterparts that rely solely on local observations, utilizing the form of the value decomposition architecture, and using a simple cross-entropy loss.
Empirical results on SMAC (with more difficult settings than the standard ones) and SMACv2 demonstrate that policies distilled in this manner consistently outperform standard CTDE baselines and often approach the performance of their centralized counterparts. These findings suggest that using better-informed agents as teachers offers a practical and scalable approach to improving the efficiency and effectiveness of MARL under decentralized execution constraints.
You are not logged in |