You are not logged in | Log in

Speaker(s): Maciej Wojtala
Affiliation: University of Warsaw
Language of the talk: English
Date: May 28, 2026, 12:30 p.m.
Room: room 4060
Seminar: Seminar Algorithmic Economics

Centralized Training with Decentralized Execution (CTDE) is the dominant paradigm in multi-agent reinforcement learning (MARL), enabling agents to act independently at test time while leveraging additional information during training. However, the most prominent methods within CTDE, based on value decomposition, are limited in learning efficiency and final performance by partial observability in both training and execution.

In this work, we propose a simple yet highly effective framework that lifts this restriction during training by allowing agents to share information in the latent space, thereby combining the full history of observations and actions. It produces better-informed behavioral policies that train faster and achieve stronger performance. To recover decentralized execution, we concurrently distill these policies into counterparts that rely solely on local observations, utilizing the form of the value decomposition architecture, and using a simple cross-entropy loss.

Empirical results on SMAC (with more difficult settings than the standard ones) and SMACv2 demonstrate that policies distilled in this manner consistently outperform standard CTDE baselines and often approach the performance of their centralized counterparts. These findings suggest that using better-informed agents as teachers offers a practical and scalable approach to improving the efficiency and effectiveness of MARL under decentralized execution constraints.

Centralized-to-Decentralized Policy Distillation for Efficient Multi-Agent Reinforcement Learning

Careers

Website

Safety on Campus