Multi-agent reinforcement learning for partially observable cooperative systems with acyclic dependence structure

Claire Bizon Monroc; Ana Bušić; Donatien Dubuc; Jiamin Zhu

Pré-Publication, Document De Travail Année : 2024

Multi-agent reinforcement learning for partially observable cooperative systems with acyclic dependence structure

(1, 2) , (1) , (2) , (2)

1
2

Claire Bizon Monroc

Fonction : Auteur
PersonId : 1378649

Apprentissage, graphes et optimisation distribuée

IFP Energies nouvelles

Ana Bušić

Fonction : Auteur
PersonId : 2602
IdHAL : anabusic
ORCID : 0000-0002-4133-3739
IdRef : 144488175

Apprentissage, graphes et optimisation distribuée

Donatien Dubuc

Fonction : Auteur

IFP Energies nouvelles

Jiamin Zhu

Fonction : Auteur

IFP Energies nouvelles

Résumé

Single-agent reinforcement learning algorithms can be directly applied to multiagent systems in an independent learning approach, but they then lose any convergence properties due to non-stationarity. We prove that in transition-independent Decentralized Partially Observable Decentralized Markov Decision Process (Dec-POMDP) non-stationarity can be mitigated by a multi-scale approach when the interdependence of agents dynamics can be represented by a directed acyclic graph (DAG). We propose a multi-scale Q-learning algorithm (MQL) where agents update local q-learning iterates at different timescales without communication and still converge. To this purpose, we first show that we can model the loss of information on the global state as a state-dependent Markovian noise. Then, we show that results from stochastic approximation theory can be used to prove the convergence of the MQL under partial state observability. Next, we give practical solutions to exploit knowledge about agent interaction to assign learning rates that ensure convergence, and propose a NetworkMQL algorithm that can achieve convergence in Network-Distributed POMDP (ND-POMDP). Finally, we validate both MQL and NetworkMQL on a wind farm control problem from the energy industry.

Mots clés

Multi-agent reinforcement learning Q-learning Reinforcement Leaning

Domaines

Système multi-agents [cs.MA] Apprentissage [cs.LG]

Fichier principal

hal_version-core.pdf (662.1 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claire Bizon Monroc : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04560319

Soumis le : vendredi 26 avril 2024-11:14:23

Dernière modification le : lundi 29 avril 2024-10:22:07

Dates et versions

hal-04560319 , version 1 (26-04-2024)

Identifiants

HAL Id : hal-04560319 , version 1

Citer

Claire Bizon Monroc, Ana Bušić, Donatien Dubuc, Jiamin Zhu. Multi-agent reinforcement learning for partially observable cooperative systems with acyclic dependence structure. 2024. ⟨hal-04560319⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS IFP CNRS INRIA INRIA2 PSL

0 Consultations

0 Téléchargements

Multi-agent reinforcement learning for partially observable cooperative systems with acyclic dependence structure

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager