Beware Untrusted Simulators – Reward-Free Backdoor Attacks in Reinforcement Learning (2026)

ICLR
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making agents without running expensive experiments on real hardware. Simulators remain a security blind spot, however, enabling adversarial developers to alter the dynamics of their released simulators for malicious purposes. Therefore, in this work we highlight a novel threat, demonstrating how simulator dynamics can be exploited to stealthily implant action-level backdoors into RL agents. The backdoor then allows an adversary to reliably activate targeted actions in an agent upon observing a predefined “trigger”, leading to potentially dangerous consequences. Traditional backdoor attacks are limited in their strong threat models, assuming the adversary has near full control over an agent’s training pipeline, enabling them to both alter and observe agent’s rewards. As these assumptions are infeasible to implement within a simulator, we propose a new attack “Daze” which is able to reliably and stealthily implant backdoors into RL agents trained for real world tasks without altering or even observing their rewards. We provide formal proof of Daze’s effectiveness in guaranteeing attack success across general RL tasks along with extensive empirical evaluations on both discrete and continuous action space domains. We additionally provide the first example of RL backdoor attacks transferring to real, robotic hardware. These developments motivate further research into securing all components of the RL training pipeline to prevent malicious attacks.

TBD (2025)

TBD
In Review

Toward Life-Long Creative Problem Solving: Using World Models for Increased Performance in Novelty Resolution (2022)

ICCC
Creative problem solving (CPS) is a skill which enables innovation, often times through repeated exploration of an agent’s world. In this work, we investigate methods for life-long creative problem solving (LLCPS), with the goal of increasing CPS capability over time. We de- velop two world models to facilitate LLCPS which use sub-symbolic action and object information to predict symbolic meta-outcomes of actions. We experiment with three CPS scenarios run sequentially and in sim- ulation. Results suggest that LLCPS is possible through the use of a world model, which can be trained on CPS exploration trials, and used to guide future CPS explo- ration.

A framework for creative problem solving through action discovery (2021)

RSS Workshop
Creative problem solving (CPS) is a process through which an agent discovers previously unknown information about itself and its environment in order to achieve an unsolvable task. In this paper, we introduce a unified framework for CPS through action discovery. We describe two methods which enable action discovery at a declarative and neurosymbolic level, namely through action primitive segmentation, and behavior babbling, respectively. We review experimental evaluations of our framework, and end with a discussion on limitations and future work considerations for CPS.

Toward creative problem solving agents: Action discovery through behavior babbling (2021)

IEEE Conference
Creative problem solving (CPS) is the process by which an agent discovers unknown information about itself and its environment, allowing it to accomplish a previously impossible goal. We propose a framework for CPS by robots for discovering novel actions via behavior babbling, capable of learning a representation of novel actions at both a symbolic planning level, and a sub-symbolic action controller level. Our framework employs two modes of discovery – a focused incubation method that scopes its search to the actions and entities composing the failed plan, and a defocused incubation method which enables exploration of actions and entities outside of the failed plan. We implemented and tested our framework using a Baxter robot in a 3D physics-based simulation environment, where we ran three proof-of-concept object manipulation scenarios. Results suggest that it is possible to use behavior babbling as a method for the autonomous discovery of flexible and reusable actions.