Title | Deep Reinforcement Learning with Temporal Logics |
Publication Type | Conference Proceedings |
Year of Conference | 2020 |
Authors | Hasanbeig M, Kroening D, Abate A |
Conference Name | International Conference on Formal Modeling and Analysis of Timed Systems |
Publisher | Springer |
Abstract | The combination of data-driven learning methods with formal reasoning has seen a surge of interest, as either area has the potential to bolstering the other. For instance, formal methods promise to expand the use of state-of-the-art learning approaches in the direction of certification and sample efficiency. In this work, we propose a deep Reinforcement Learning (RL) method for policy synthesis in continuous-state/action unknown environments, under requirements expressed in Linear Temporal Logic (LTL). We show that this combination lifts the applicability of deep RL to complex temporal and memory-dependent policy synthesis goals. We express an LTL specification as a Limit Deterministic Büchi Automaton (LDBA) and synchronise it on-the-fly with the agent/environment. The LDBA in practice monitors the environment, acting as a modular reward machine for the agent: accordingly, a modular Deep Deterministic Policy Gradient (DDPG) architecture is proposed to generate a low-level control policy that maximises the probability of the given LTL formula. We evaluate our framework in a cart-pole example and in a Mars rover experiment, where we achieve near-perfect success rates, while baselines based on standard RL are shown to fail in practice. |
URL | https://link.springer.com/chapter/10.1007/978-3-030-57628-8_1 |
C/O Department of Computer Science
Wolfson Building, Parks Rd, Oxford OX1 3QD
T: +44 (0) 18656 10767