基于多智能体深度强化学习的多区域负荷频率协同控制

doi:10.12158/j.2096-3203.2026.05.007

首页 > 过刊浏览>2026年第45卷第5期 >69-80. DOI:10.12158/j.2096-3203.2026.05.007

基于多智能体深度强化学习的多区域负荷频率协同控制
DOI:
                        10.12158/j.2096-3203.2026.05.007
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金资助项目（52406227）；贵州省科技支撑项目（黔科合成果LH[2025]重点014）

Multi-area load frequency cooperative control based on multi-agent deep reinforcement learning

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着大规模可再生能源并网，电力系统频率稳定性面临严峻挑战。为此，文中提出基于专家数据预填充机制的多智能体双延迟深度确定性策略梯度（expert-prefilled multi-agent twin delayed deep deterministic policy gradient, EP-MATD3）算法，用于多区域负荷频率控制。首先，构建包含火电机组、风电机组、光伏系统和储能系统的多区域频率响应模型，并在传统多区域电力系统联络线功率模型的基础上，考虑各区域控制器之间的协调控制，不仅可以增强各区域之间的联系，还可以降低计划外的功率交换；其次，采用多智能体双延迟深度确定性策略梯度（multi-agent twin delayed deep deterministic policy gradient, MATD3）算法，以缓解传统强化学习算法中Q值过估计的问题，提升策略稳定性和收敛性；此外，依照“集中训练、分散执行”的协同控制框架，在集中训练阶段，引入专家数据预填充机制，以减少随机探索过程中无效动作的产生，从而加速智能体训练收敛；在分散执行阶段，训练完成的智能体仅需基于本区域的实时系统状态，即可自主调整机组发电功率，实现对频率波动的有效抑制。基于三区域电力系统的仿真结果表明，相较于传统方法，所提EP-MATD3控制策略显著缩短了算法收敛时间，并在连续阶跃扰动和光伏波动扰动下，有效降低了系统频率偏差，验证了该方法在复杂电力系统频率控制中的有效性和优越性。

Abstract:

With the integration of large-scale renewable energy, the frequency stability of power systems has been subjected to severe challenges. In this study, an expert-prefilled multi-agent twin delayed deep deterministic policy gradient (EP-MATD3) algorithm based on an expert data pre-filling mechanism is proposed for multi-area load frequency control. Firstly, a multi-area frequency response model including thermal power units, wind turbines, photovoltaic systems, and energy storage systems is first established. Based on the traditional multi-area tie-line power model, coordinated control between regional controllers is incorporated, by which the interconnection between regions is strengthened and unplanned power exchanges are reduced. Then, the multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithm is adopted to mitigate the Q-value overestimation problem inherent in traditional reinforcement learning, and stability and convergence of the control policy are enhanced. Furthermore, within the collaborative control framework of centralized training and decentralized execution, an expert data pre-filling mechanism is introduced during the centralized training stage, whereby the occurrence of invalid actions during random exploration is limited and the convergence of agent training is accelerated. During the decentralized execution stage, unit power outputs are independently adjusted by the trained agents according to the real-time system states of their respective regions, enabling effective suppression of frequency fluctuations. Through simulation on a three-area power system, it is demonstrated that, compared with traditional methods, the proposed EP-MATD3 control strategy achieves a significant reduction in training time and effectively decreases system frequency deviations under continuous step-load and photovoltaic fluctuation disturbances, thereby verifying its effectiveness and superiority in the frequency control of complex power systems.

参考文献

相似文献

引证文献

引用本文

许庆禹,何宇,张靖,沈涛,齐岳,赵健.基于多智能体深度强化学习的多区域负荷频率协同控制[J].电力工程技术,2026,45(5):69-80. XU Qingyu, HE Yu, ZHANG Jing, SHEN Tao, QI Yue, ZHAO Jian. Multi-area load frequency cooperative control based on multi-agent deep reinforcement learning[J]. Electric Power Engineering Technology,2026,45(5):69-80.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-09-02
最后修改日期:2025-12-07
在线发布日期: 2026-05-27
出版日期: 2026-05-28

首页

期刊简介

编委会

道德声明与制度

投稿须知

开放获取声明

中英文目录

联系我们

ENGLISH

引用本文

分享

相关视频

文章指标

历史

文章二维码