Modelling and quantifying numerical integration errors in deep reinforcement learning for propulsion dynamics

Bajrami, Enes; Bajrami, Ensar; Lameski, Petre

doi:10.1016/j.ast.2026.112209

Modelling and quantifying numerical integration errors in deep reinforcement learning for propulsion dynamics

Journal

Aerospace Science and Technology

Date Issued

2026-10

Author(s)

Bajrami, Enes

Bajrami, Ensar

DOI

10.1016/j.ast.2026.112209

Abstract

This study investigates how numerical integration accuracy influences the training dynamics and control performance of deep reinforcement learning controllers applied to propulsion system simulations. The propulsion dynamics are represented by a continuous second-order thrust-driven model that is discretised using four numerical integration configurations: Euler (coarse, medium, and fine time steps) and Runge-Kutta fourth order (RK4). Three widely used model-free reinforcement learning algorithms, Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradient (TD3), are evaluated together with a linear proportional-derivative baseline controller. A large experimental campaign comprising more than 50,000 simulated episodes was conducted across three training phases to quantify the influence of discretisation accuracy on reward convergence, trajectory stability, and control energy. The results demonstrate that numerical integration fidelity significantly shapes the optimisation landscape experienced by reinforcement learning agents. Under coarse Euler discretisation, PPO exhibits unstable learning behaviour and large oscillatory trajectories, while SAC maintains improved robustness but still shows sensitivity to large time steps. TD3 demonstrates the highest tolerance to discretisation error, maintaining stable closed-loop dynamics even under coarse integration. Higher-accuracy numerical schemes substantially improve learning efficiency. The RK4 configuration produces smoother trajectories, reduced control energy, and faster convergence across all reinforcement learning algorithms. Quantitative analysis of trajectory stability, integrated error metrics, and reward statistics confirms that discretisation error directly propagates through the learning process and alters the resulting control policies. These findings provide new empirical evidence that numerical integration fidelity is a critical design factor for reinforcement learning environments involving dynamical systems. The study highlights the necessity of carefully selecting integration schemes when training reinforcement learning controllers for propulsion dynamics and other physics-based control applications.