AI/ML Completed

QUBE Servo Pendulum - Deep Reinforcement Learning

Trained reinforcement learning agents to control a Quanser QUBE-Servo 2 inverted pendulum using PPO in Isaac Lab, Soft Actor-Critic (SAC), and Q-Learning with custom Furuta pendulum dynamics simulation.

Duration

Spring 2025

Role

Developer

Institution

NTNU

Status

Completed

Technologies Used

PythonPyTorchIsaac LabStable-Baselines3PPOSACQ-LearningNVIDIA OmniverseROS

Overview

This AIS4002 Intelligent Machines Module 2 project explored three different reinforcement learning approaches for the QUBE-Servo 2 pendulum swing-up and stabilization task: (1) PPO training in Isaac Lab with GPU-accelerated parallel simulation, (2) Soft Actor-Critic variants in a custom Python simulator with Lagrangian dynamics, and (3) Q-Learning with discretized state-action spaces. The project included 3D modeling of the QUBE in Fusion 360, URDF/USD conversion, and custom reward function design.

Problem Statement

The Furuta pendulum (inverted rotary pendulum) is a classic nonlinear control problem. The goal is to train an RL agent to swing up the pendulum from its resting position and stabilize it in the inverted (upright) position using only motor torque commands, while handling partial observability and physical constraints.

Challenges & Solutions

Challenge	Solution	Outcome
Sim-to-Real Transfer Gap	Implemented domain randomization and parameter variation during training	Improved robustness though real hardware transfer remains challenging
Reward Function Design	Designed multi-component reward with upright bonus, velocity penalties, and energy shaping	Successful swing-up and stabilization within 3 seconds
Sample Efficiency	Used GPU-accelerated parallel training with 4096 environments in Isaac Lab	Training completed in ~20 minutes with PPO

Progress

✓ 3D modeling and URDF/USD conversion

✓ Isaac Lab environment setup with PPO

✓ Custom Python simulator with Lagrangian dynamics

✓ SAC variants implementation (baseline, frame stacking, RNN)

✓ Q-Learning with discretized state space

✓ Reward function design and optimization

✓ Training and evaluation across methods

✓ TRPO presentation video

All Projects