Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

导出

摘要 Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.

作者 Lixin WANG Sizhuang ZHENG Haiyin PIAO Changqian LU Ting YUE Hailiang LIU

机构地区 School of Aeronautical Science and Engineering Shenyang Aircraft Design&Research Institute

出处《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第7期391-405,共15页 中国航空学报（英文版）

关键词 Air combat Autonomous maneuver decision Robust reinforcement learning Tube-based algorithm Combat simulation

分类号 U463.6 [机械工程—车辆工程]

引文网络
相关文献

参考文献3

1孙聪.从空战制胜机理演变看未来战斗机发展趋势[J].航空学报,2021,42(8):1-13. 被引量：25
2董一群,艾剑良.自主空战技术中的机动决策:进展与展望[J].航空学报,2020(S02):4-12. 被引量：14
3樊会涛,闫俊.空战体系的演变及发展趋势[J].航空学报,2022,43(10):288-297. 被引量：21

二级参考文献14

1张曙光,高浩.X-31A飞机的设计特点和试飞情况[J].飞行力学,1996,14(3):9-13. 被引量：3
2肖邦振.张积慧：击落戴维斯[J].环球飞行,2008(10):70-77. 被引量：1
3高劲松,陈哨东.国外隐身战斗机超视距空战问题[J].电光与控制,2011,18(8):17-20. 被引量：10
4樊会涛.空战制胜“四先”原则[J].航空兵器,2013,20(1):3-7. 被引量：28
5樊会涛,崔颢,天光.空空导弹70年发展综述[J].航空兵器,2016,23(1):3-12. 被引量：89
6冯超,景小宁,李秋妮,姚鹏.基于隐马尔可夫模型的空战决策点理论研究[J].北京航空航天大学学报,2017,43(3):615-626. 被引量：12
7何旭,景小宁,冯超.基于蒙特卡洛树搜索方法的空战机动决策[J].空军工程大学学报（自然科学版）,2017,18(5):36-41. 被引量：13
8徐光达,吕超,王光辉,谢宇鹏.基于双矩阵对策的UCAV空战自主机动决策研究[J].舰船电子工程,2017,37(11):24-28. 被引量：13
9樊会涛,闫俊.自主化——机载导弹重要的发展方向[J].航空兵器,2019,26(1):1-10. 被引量：29
10刘代军,王超磊.空空导弹智能化技术的发展与展望[J].航空兵器,2019,26(1):25-29. 被引量：16

共引文献49

1马金毅,王灿,薛涛,艾剑良,董一群.空战格斗飞行机动数据库建立及应用[J].航空学报,2023,44(S01):39-47.
2谢育星,陆屹,管聪,纪德东.协同空战与多智能体强化学习下的关键问题[J].飞机设计,2023,43(1):6-10.
3李明敏,李世秋,范真真,王小辰,蔡斐.独立作者约稿助力中文科技期刊高质量发展[J].编辑学报,2023,35(2):210-213. 被引量：9
4方伟,王玉,闫文君,宫跃.基于微分思想和卷积神经网络的飞行动作识别[J].中国电子科学研究院学报,2021,16(4):347-353. 被引量：6
5张晓杰,周中良.基于APF-DQN的空战机动决策方法[J].飞行力学,2021,39(5):88-94. 被引量：2
6殷宇维,王凡,吴奎,胡剑秋.基于改进DDPG的空战行为决策方法[J].指挥控制与仿真,2022,44(1):97-102. 被引量：6
7程岳,刘作龙,韩伟,余冠锋.异构融合的航空嵌入式视觉计算平台[J].航空计算技术,2022,52(1):87-91.
8鲁晨欣,刘作龙,谢建春,张晓.面向机载雷达综合处理的演示验证系统设计与实现[J].信息技术与信息化,2022(3):153-156.
9单圣哲,杨孟超,张伟伟,高传强.自主空战连续决策方法[J].航空工程进展,2022,13(5):47-58. 被引量：7
10杨成斌,李培凯,魏维,徐富强.基于AHP-Delphi的舰船动力系统效能评估方法[J].科技创新与应用,2022,12(36):88-92. 被引量：1

1Yaoming ZHOU,Fan YANG,Chaoyue ZHANG,Shida LI,Yongchao WANG.Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning[J].Chinese Journal of Aeronautics,2024,37(8):311-328.

Chinese Journal of Aeronautics

2024年第7期

浏览历史

内容加载中请稍等...

Tube-based robust reinforcement learning for autonomous maneuver decision for UCAVs

参考文献3

二级参考文献14

共引文献49

相关作者

相关机构

相关主题

浏览历史