QUBE Servo Pendulum - Deep Reinforcement Learning
Trained reinforcement learning agents to control a Quanser QUBE-Servo 2 inverted pendulum using PPO in Isaac Lab, Soft Actor-Critic (SAC), and Q-Learning with custom Furuta pendulum dynamics simulation.
Duration
Spring 2025
Role
Developer
Institution
NTNU
Status
Completed
Technologies Used
Overview
This AIS4002 Intelligent Machines Module 2 project explored three different reinforcement learning approaches for the QUBE-Servo 2 pendulum swing-up and stabilization task: (1) PPO training in Isaac Lab with GPU-accelerated parallel simulation, (2) Soft Actor-Critic variants in a custom Python simulator with Lagrangian dynamics, and (3) Q-Learning with discretized state-action spaces. The project included 3D modeling of the QUBE in Fusion 360, URDF/USD conversion, and custom reward function design.
Problem Statement
The Furuta pendulum (inverted rotary pendulum) is a classic nonlinear control problem. The goal is to train an RL agent to swing up the pendulum from its resting position and stabilize it in the inverted (upright) position using only motor torque commands, while handling partial observability and physical constraints.
Challenges & Solutions
| Challenge | Solution | Outcome |
|---|---|---|
| Sim-to-Real Transfer Gap | Implemented domain randomization and parameter variation during training | Improved robustness though real hardware transfer remains challenging |
| Reward Function Design | Designed multi-component reward with upright bonus, velocity penalties, and energy shaping | Successful swing-up and stabilization within 3 seconds |
| Sample Efficiency | Used GPU-accelerated parallel training with 4096 environments in Isaac Lab | Training completed in ~20 minutes with PPO |