This book surveys the state of the art in explainable and interpretable reinforcement learning (RL) as relevant for robotics. While RL in general has grown in popularity and been applied to increasingly complex problems, several challenges have impeded the real-world adoption of RL algorithms for robotics and related areas. These include difficulties in preventing safety constraints from being violated and the issues faced by systems operators who desire explainable policies and actions. Robotics applications present a unique set of considerations and result in a number of opportunities related to their physical, real-world sensory input and interactions.
The authors consider classification techniques used in past surveys and papers and attempt to unify terminology across the field. The book provides an in-depth exploration of 12 attributes that can be used to classify explainable/interpretable techniques. These include whether the RL method is model-agnostic or model-specific, self-explainable or post-hoc, as well as additional analysis of the attributes of scope, when-produced, format, knowledge limits, explanation accuracy, audience, predictability, legibility, readability, and reactivity. The book is organized around a discussion of these methods broken down into 42 categories and subcategories, where each category can be classified according to some of the attributes. The authors close by identifying gaps in the current research and highlighting areas for future investigation.