摘要
以异构测控网资源联合调度为研究对象,提出一种基于强化学习的深度Q网络(deep Q network, DQN)算法。在充分分析异构测控资源联合调度问题特点后,用数学语言对影响问题求解的约束条件进行描述,建立了资源联合调度模型;从应用强化学习解决问题的角度,对求解的问题进行马尔科夫决策过程描述后,分别设计了2个结构相同的神经网络和基于ε贪婪算法的动作选择策略,并建立了DQN求解框架。仿真结果表明:基于DQN的异构测控资源调度方法较遗传算法能够找到调度收益更优的测控调度方案。
Joint scheduling of heterogeneous TT&C resources as research object, a deep Q network(DQN) algorithm based on reinforcement learning is proposed. The characteristics of the joint scheduling problem of heterogeneous TT&C resources being fully analyzied and mathematical language being used to describe the constraints affecting the solution, a resource joint scheduling model is established. From the perspective of applying reinforcement learning, two neural networks with the same structure and the action selection strategies based on ε greedy algorithm are respectively designed after Markov decision process description, and DQN solution framework is established. The simulation results show that DQNbased heterogeneous TT&C resources scheduling method can identify a TT&C scheduling scheme with better scheduling revenue than the genetic algorithm.
作者
薛乃阳
丁丹
贾玉童
王志强
刘渊
Xue Naiyang;Ding Dan;Jia Yutong;Wang Zhiqiang;Liu Yuan(Graduate School,Space Engineering University,Beijing 101416,China;Department of Electronic and Optical Engineering,Space Engineering University,Beijing 101416,China;PLA 61646 Troops,Beijing 100192,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2023年第2期423-434,共12页
Journal of System Simulation
关键词
航天测控
异构测控资源联合调度
深度Q网络
调度收益
强化学习
telemetry
track and command(TT&C)
joint scheduling of heterogeneous TT&C resources
deep Q network
scheduling revenue
reinforcement learning