当前位置：首页 > news >正文

【论文阅读】Search-Based Testing Approach for Deep Reinforcement Learning Agents

news 来源：原创 2024/5/20 21:34:00

文章目录

一.论文信息
二.论文结构
三.论文内容
- 摘要
- Introduction

一.论文信息

题目：
Search-Based Testing Approach for DeepReinforcement Learning Agents.
基于搜索的深度强化学习智能体测试方法

发表年份：
2022

期刊/会议：
arkiv

论文链接：
http://arxiv.org/abs/2206.07813

作者信息：
Amirhossein Zolfagharian, Manel Abdellatif, Lionel Briand, Mojtaba Bagherzadeh and Ramesh S

二.论文结构

1.Introduction
2.Background
	2.1 Deﬁnitions
	2.2 State Abstraction	
3.Problem Definition
	3.1 RL Agent Testing Challenges
	3.2 Assumptions
4.Approach
	4.1 Reformulation as a Search Problem（重新表述为一个搜索问题）
	4.2 Overview of the Approach（方法概括）
	4.3 Initial Population（初始化种群）
	4.4 Fitness Computations（健康度的计算）
	4.5 Search Operators（搜索算符）
	4.6 Execution of Final Results（执行最终结果）
5.Empirical Evaluation（经验评估）
	5.1 Research Questions（提出的研究问题）
	5.2 Case Study（案例研究）
	5.3 Implementation（实现）
	5.4 Evaluation and Results（效果和评价）
6.Discussions
7.Threats to Validity（威胁的有效性）
8.Related Work
9.Conclusion

三.论文内容

摘要

在过去十年中(during the last decade)，深度强化学习（DRL）算法被越来越多地用于解决各种决策问题(solve various decision-making problems)，如自动驾驶、交易决策和机器人技术。然而，这些算法在安全关键环境中部署时面临着巨大的挑战，因为它们经常表现出错误的行为(exhibit erroneous behaviors)，可能导致潜在的关键错误。

评估DRL智能体安全性(assess the safety of DRL agents)的方法之一是对其进行测试，以检测在执行过程中可能导致关键故障的故障。这就提出了一个问题(this raises the question of)，即我们如何有效地测试DRL策略，以确保它们的正确性和符合安全需求(adherence to safety requirements)。

大多数现有的测试(most existing works on)DRL智能体的工作使用干扰智能体状态或动作(perturb states or actions)的对抗性攻击。然而，这种攻击往往会导致环境的不现实状态(lead to unrealistic states of the environment)。此外，他们的主要目标是测试DRL智能体的鲁棒性(test the robustness of DRL agents)，而不是测试智能体的策略与需求的合规性(testing the compliance of agents’ policies with respect to requirements)。

由于深度强化学习环境的巨大状态空间(the huge state space of DRL environments)、测试执行成本高(the high cost of test execution)以及深度强化学习算法的黑箱特性(the black-box nature of DRL algorithms)，无法对深度强化学习代理进行穷举测试。本文提出一种基于搜索的强化学习智能体测试方法(STARLA)，通过在有限的测试预算(within a limited testing budget)中有效搜索智能体执行失败的策略来测试DRL智能体的策略。依靠机器学习模型和专用遗传算法(a dedicated genetic algorithm)将搜索范围缩小到错误情节(即DRL智能体产生的状态和动作序列)(faulty episodes)。将STARLA应用于一个广泛使用的深度q学习智能体上，作为基准，表明它通过检测更多与智能体策略相关的错误，明显优于随机测试。

我们还研究了如何使用搜索结果提取描述DRL智能体错误情节的规则。这些规则可用于了解智能体失败的条件，从而评估部署它的风险(assess the risks of deploying it)。

Introduction

However, like for DNN components, their application in production environments requires effective and systematic testing, especially when used in safety-critical applications. For instance, deploying a reinforcement learning agent in autonomous driving systems entails major concerns around safety as we should pay attention not only to the extent to which the agent’s objectives are met, but also to damage avoidance [5].
然而，与DNN组件一样，它们在生产环境中的应用需要有效和系统的测试，特别是在安全关键应用中使用时。例如，在自动驾驶系统中部署强化学习代理涉及安全方面的主要问题，因为我们不仅要注意代理的目标达到的程度，还要注意避免损害[5]。
One of the ways to assess the safety of DRL agents is to test them in order to detect possible faults leading to critical failures during their execution.
评估DRL代理安全性的方法之一是对它们进行测试，以检测在执行过程中可能导致关键故障的故障【detect faults】。
By definition, a fault in DRL-based systems corresponds to a problem in the RL policy that may lead to the agent’s failure during the execution.
根据定义，基于DRL的系统中的错误【fault】对应于RL策略中可能导致智能体在执行期间失败的问题。