Improving deep reinforcement learning by safety guarding model via hazardous experience planning|Pai PENG;Fei ZHU;Xinghong LING;Peiyao ZHAO;Quan LIU - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

Improving deep reinforcement learning by safety guarding model via hazardous experience planning

文献摘要：

1 Introduction and main contributions Deep reinforcement learning that considers the advantages of both deep learning and reinforcement learning has achieved success in many fields[l].However,during the learning process,a possibility still exists that the agent fails in the task because of falling into hazardous states due to taking improper actions.It can be concluded that setting up a mechanism to avoid these hazardous states and actions is in a position to improve the success rate of the agent[2-4].How to ensure the safety of policies is a fuindamental obstacle to the practical application of artificial intelligence[5].The experience sam-ples are obtained through trial and error without an early warning model,and agents may get to local hazardous states[6],making it necessary to construct an effective safety guarding model that provides danger warning information and improve the robustness of the system.

文献关键词：

中图分类号：

[1] 自动化技术、计算机技术（TP） / 计算技术、计算机技术（TP3） / 计算机的应用（TP39） / 信息处理(信息加工)（TP391）

[2] 自动化技术、计算机技术（TP） / 自动化基础理论（TP1） / 人工智能理论（TP18）

[3] 医药、卫生（R） / 基础医学（R3） / 病理学（R36） / 病理过程（R364）

作者姓名：

Pai PENG;Fei ZHU;Xinghong LING;Peiyao ZHAO;Quan LIU

作者机构：

School of Computer Science and Technology,Soochow University,Suzhou 215006,China

文献出处：

计算机科学前沿

引用格式：

[1]Pai PENG;Fei ZHU;Xinghong LING;Peiyao ZHAO;Quan LIU-.Improving deep reinforcement learning by safety guarding model via hazardous experience planning)[J].计算机科学前沿,2022(04):214-216

A类：

Deep,fuindamental

B类：

Improving,deep,reinforcement,learning,by,safety,guarding,model,via,hazardous,experience,planning,Introduction,main,contributions,that,considers,advantages,both,has,achieved,success,many,fields,However,during,process,possibility,still,exists,fails,task,because,falling,into,states,due,taking,improper,actions,It,can,concluded,setting,up,mechanism,avoid,these,position,improve,rate,ensure,policies,obstacle,practical,application,artificial,intelligence,sam,ples,are,obtained,through,trial,error,without,early,warning,agents,may,get,local,making,necessary,construct,effective,provides,danger,information,robustness,system

AB值：

0.607719

相似文献

Choice of discount rate in reinforcement learning with long-delay rewards

LIN Xiangyang;XING Qinghua;LIU Fuxian-Department of Air Defense and Anti-Missile,Air Force Engineering University,Xi'an 710051,China

Knowledge transfer in multi-agent reinforcement learning with incremental number of agents

LIU Wenzhang;DONG Lu;LIU Jian;SUN Changyin-School of Automation,Southeast University,Nanjing 210096,China;School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China

Research on virtual entity decision model for LVC tactical confrontation of army units

GAO Ang;GUO Qisheng;DONG Zhiming;TANG Zaijiang;ZHANG Ziwei;FENG Qiqi-Military Exercise and Training Center,Army Academy of Armored Forces,Beijing 100072,China

Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks

Xiaoyu LIU;Chi XU;Haibin YU;Peng ZENG-State Key Laboratory of Robotics,Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang 110016,China;Key Laboratory of Networked Control Systems,Chinese Academy of Sciences,Shenyang 110016,China;Institutes for Robotics and Intelligent Manufacturing,Chinese Academy of Sciences,Shenyang 110169,China;University of Chinese Academy of Sciences,Beijing 100049,China

Minimax Q-learning design for H∞ control of linear discrete-time systems