Skip to main content
Back to Projects
AI/ML Completed

QUBE Servo Pendulum - Deep Reinforcement Learning

Trained reinforcement learning agents to control a Quanser QUBE-Servo 2 inverted pendulum using PPO in Isaac Lab, Soft Actor-Critic (SAC), and Q-Learning with custom Furuta pendulum dynamics simulation.

Duration

Spring 2025

Role

Developer

Institution

NTNU

Status

Completed

Technologies Used

PythonPyTorchIsaac LabStable-Baselines3PPOSACQ-LearningNVIDIA OmniverseROS

Overview

This AIS4002 Intelligent Machines Module 2 project explored three different reinforcement learning approaches for the QUBE-Servo 2 pendulum swing-up and stabilization task: (1) PPO training in Isaac Lab with GPU-accelerated parallel simulation, (2) Soft Actor-Critic variants in a custom Python simulator with Lagrangian dynamics, and (3) Q-Learning with discretized state-action spaces. The project included 3D modeling of the QUBE in Fusion 360, URDF/USD conversion, and custom reward function design.

Problem Statement

The Furuta pendulum (inverted rotary pendulum) is a classic nonlinear control problem. The goal is to train an RL agent to swing up the pendulum from its resting position and stabilize it in the inverted (upright) position using only motor torque commands, while handling partial observability and physical constraints.

Challenges & Solutions

Challenge Solution Outcome
Sim-to-Real Transfer Gap Implemented domain randomization and parameter variation during training Improved robustness though real hardware transfer remains challenging
Reward Function Design Designed multi-component reward with upright bonus, velocity penalties, and energy shaping Successful swing-up and stabilization within 3 seconds
Sample Efficiency Used GPU-accelerated parallel training with 4096 environments in Isaac Lab Training completed in ~20 minutes with PPO

Progress

3D modeling and URDF/USD conversion
Isaac Lab environment setup with PPO
Custom Python simulator with Lagrangian dynamics
SAC variants implementation (baseline, frame stacking, RNN)
Q-Learning with discretized state space
Reward function design and optimization
Training and evaluation across methods
TRPO presentation video