Repository logo
Communities & Collections
Research Outputs
Fundings & Projects
People
Statistics
User Manual
Have you forgotten your password?
  1. Home
  2. Faculty of Computer Science and Engineering
  3. Faculty of Computer Science and Engineering: Journal Articles
  4. Modelling and quantifying numerical integration errors in deep reinforcement learning for propulsion dynamics
Details

Modelling and quantifying numerical integration errors in deep reinforcement learning for propulsion dynamics

Journal
Aerospace Science and Technology
Date Issued
2026-10
Author(s)
Bajrami, Enes
Bajrami, Ensar
DOI
10.1016/j.ast.2026.112209
Abstract
This study investigates how numerical integration accuracy influences the training dynamics and control performance of deep reinforcement learning controllers applied to propulsion system simulations. The propulsion dynamics are represented by a continuous second-order thrust-driven model that is discretised using four numerical integration configurations: Euler (coarse, medium, and fine time steps) and Runge-Kutta fourth order (RK4). Three widely used model-free reinforcement learning algorithms, Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradient (TD3), are evaluated together with a linear proportional-derivative baseline controller. A large experimental campaign comprising more than 50,000 simulated episodes was conducted across three training phases to quantify the influence of discretisation accuracy on reward convergence, trajectory stability, and control energy. The results demonstrate that numerical integration fidelity significantly shapes the optimisation landscape experienced by reinforcement learning agents. Under coarse Euler discretisation, PPO exhibits unstable learning behaviour and large oscillatory trajectories, while SAC maintains improved robustness but still shows sensitivity to large time steps. TD3 demonstrates the highest tolerance to discretisation error, maintaining stable closed-loop dynamics even under coarse integration. Higher-accuracy numerical schemes substantially improve learning efficiency. The RK4 configuration produces smoother trajectories, reduced control energy, and faster convergence across all reinforcement learning algorithms. Quantitative analysis of trajectory stability, integrated error metrics, and reward statistics confirms that discretisation error directly propagates through the learning process and alters the resulting control policies. These findings provide new empirical evidence that numerical integration fidelity is a critical design factor for reinforcement learning environments involving dynamical systems. The study highlights the necessity of carefully selecting integration schemes when training reinforcement learning controllers for propulsion dynamics and other physics-based control applications.

⠀

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Accessibility settings
  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify