当前位置：首页 > news >正文

XAI将创建一套机器学习技术，使人类用户能够理解、适当信任并有效管理新一代人工智能合作伙伴

news 来源：原创 2024/5/17 19:32:29

2.2、What?‌
Although it might be considered to be beyond the scope of this paper, it is worth noting the discussion held around general theories of explanation in the realm of philosophy [23]. Many proposals have been done in this regard, suggesting the need for a general, unified theory that approximates the structure and intent of an explanation. However, nobody has stood the critique when presenting such a general theory. For the time being, the most agreed-upon thought blends together different approaches to explanation drawn from diverse knowledge disciplines. A similar problem is found when addressing interpretability in AI. It appears from the literature that there is not yet a common point of understanding on what interpretability or explainability are. However, many contributions claim the achievement of interpretable models and techniques that empower explainability.

To shed some light on this lack of consensus, it might be interesting to place the reference starting point at the definition of the term Explainable Artificial Intelligence (XAI) given by D. Gunning in [7]:

“XAI will create a suite of machine learning techniques that enables human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners”

虽然它可能被认为超出了本文的范围，但值得注意的是，在哲学[23]领域围绕一般解释理论进行的讨论。在这方面已经有许多建议，表明需要一个通用的、统一的理论来近似解释的结构和意图。然而，在提出这样一个普遍的理论时，没有人能经受住批评。就目前而言，最一致的想法融合了来自不同知识学科的不同解释方法。在处理AI的可解释性时也发现了类似的问题。从文献来看，对于什么是interpretability或 explainability还没有一个共同的理解点。然而，许多贡献声称实现了可解释模型和增强可解释性的技术。

为了阐明这种缺乏共识，将参考起点放在 D. Gunning 在 [7] 中给出的可解释人工智能 (XAI) 术语的定义上可能会很有趣:

“XAI将创建一套机器学习技术，使人类用户能够理解、适当信任并有效管理新一代人工智能合作伙伴。”

This definition brings together two concepts (understanding and trust) that need to be addressed in advance. However, it misses to consider other purposes motivating the need for interpretable AI models, such as causality, transferability, informativeness, fairness and confidence [5, 24, 25, 26]. We will later delve into these topics, mentioning them here as a supporting example of the incompleteness of the above definition.

As exemplified by the definition above, a thorough, complete definition of explainability in AI still slips from our fingers. A broader reformulation of this definition (e.g. “An explainable Artificial Intelligence is one that produces explanations about its functioning”) would fail to fully characterize the term in question, leaving aside important aspects such as its purpose. To build upon the completeness, a definition of explanation is first required.

As extracted from the Cambridge Dictionary of English Language, an explanation is “the details or reasons that someone gives to make something clear or easy to understand” [27]. In the context of an ML model, this can be rephrased as: ”the details or reasons a model gives to make its functioning clear or easy to understand”. It is at this point where opinions start to diverge. Inherently stemming from the previous definitions, two ambiguities can be pointed out. First, the details or the reasons used to explain, are completely dependent of the audience to which they are presented. Second, whether the explanation has left the concept clear or easy to understand also depends completely on the audience. Therefore, the definition must be rephrased to reflect explicitly the dependence of the explainability of the model on the audience. To this end, a reworked definition could read as:

Given a certain audience, explainability refers to the details and reasons a model gives to make its functioning clear or easy to understand.

该定义将需要提前解决的两个概念（理解和信任）结合在一起。然而，它忽略了激发对可解释 AI 模型的需求的其他目的，例如因果关系、可转移性、信息性、公平性和信心 [5, 24, 25, 26]。我们稍后将深入探讨这些主题，在这里提到它们作为上述定义不完备的一个支持例子。

如上面的定义所示，对于AI的可解释性，我们仍然无法给出一个完整的定义。如果对这一定义进行更广泛的重新表述(例如“可解释的人工智能是对其功能产生解释的人工智能”)，将无法完全描述这个术语的特征，而忽略了它的目的等重要方面。为了建立在完整性的基础上，首先需要对解释的定义。

正如摘自《剑桥英语词典》的解释，解释是“某人为使某事变得清晰或易于理解而给出的细节或理由”[27]。在ML模型的环境中，这可以被重新表述为:“一个模型给出的细节或原因，使其功能清晰或易于理解”。正是在这一点上，意见开始出现分歧。从先前的定义中可以指出两个歧义。

首先，用于解释的细节或原因完全取决于它们所呈现的受众。

其次，解释是否让概念清晰或容易理解，也完全取决于受众。

因此，定义必须重新措辞，以明确反映模型的可解释性对观众的依赖。为此，修改后的定义可以理解为:

对于特定的受众，可解释性是指模型为使其功能清晰或易于理解而提供的细节和原因。。

Since explaining, as argumenting, may involve weighting, comparing or convincing an audience with logic-based formalizations of (counter) arguments [28], explainability might convey us into the realm of cognitive psychology and the psychology of explanations [7], since measuring whether something has been understood or put clearly is a hard task to be gauged objectively. However, measuring to which extent the internals of a model can be explained could be tackled objectively. Any means to reduce the complexity of the model or to simplify its outputs should be considered as an XAI approach. How big this leap is in terms of complexity or simplicity will correspond to how explainable the resulting model is. An underlying problem that remains unsolved is that the interpretability gain provided by such XAI approaches may not be straightforward to quantify: for instance, a model simplification can be evaluated based on the reduction of the number of architectural elements or number of parameters of the model itself (as often made, for instance, for DNNs). On the contrary, the use of visualization methods or natural language for the same purpose does not favor a clear quantification of the improvements gained in terms of interpretability. The derivation of general metrics to assess the quality of XAI approaches remain as an open challenge that should be under the spotlight of the field in forthcoming years. We will further discuss on this research direction in Section 5.

Explainability is linked to post-hoc explainability since it covers the techniques used to convert a non-interpretable model into a explainable one. In the remaining of this manuscript, explainability will be considered as the main design objective, since it represents a broader concept. A model can be explained, but the interpretability of the model is something that comes from the design of the model itself. Bearing these observations in mind, explainable AI can be defined as follows:

Given an audience, an explainable Artificial Intelligence is one that produces details or reasons to make its functioning clear or easy to understand.

因为解释，作为论证，可能涉及到用基于逻辑的(反)论证[28]的形式来权衡、比较或说服观众，可解释性可能会把我们带入认知心理学和解释心理学[7]的领域，因为衡量某件事是否被理解或清楚地表达是一项很难客观地衡量的任务。但是，可以客观地衡量模型内部可以解释的程度。任何降低模型复杂性或简化其输出的方法都应该被视为XAI方法。这个飞跃在复杂性或简单性方面有多大，将对应于最终模型的可解释性。一个尚未解决的潜在问题是，这种XAI方法提供的可解释性增益可能无法直接量化：例如，可以根据减少体系结构元素的数量或模型本身参数的数量来评估模型简化(例如，对于DNN经常这样做)。相反，为了同样的目的而使用可视化方法或自然语言，并不有利于明确量化在可解释性方面获得的改进。评估 XAI 方法质量的通用指标的推导仍然是一个公开的挑战，应该在未来几年成为该领域的焦点。我们将在第 5 节进一步讨论这个研究方向。

可解释性与事后可解释性相关联，因为它涵盖了用于将不可解释模型转换为可解释模型的技术。在本手稿的其余部分，可解释性将被认为是主要的设计目标，因为它代表了一个更广泛的概念。模型可以被解释，但是模型的可解释性来自于模型本身的设计。考虑到这些观点，可解释AI可以被定义为:

对于受众来说，可解释的人工智能是一种产生细节或理由以使其功能清晰或易于理解的人工智能。

This definition is posed here as a first contribution of the present overview, implicitly assumes that the ease of understanding and clarity targeted by XAI techniques for the model at hand reverts on different application purposes, such as a better trustworthiness of the model’s output by the audience.

这个定义在这里提出作为本概述的第一个贡献，它隐含地假设XAI技术对现有模型的易于理解和清晰的目标取决于不同的应用程序目的，比如让受众更好地信任模型的输出。