NVIDIA Kaya Robot - Computer Vision for Autonomous Navigation
Developed the computer vision system for an autonomous mobile robot using YOLOv8 object detection, Intel RealSense depth sensing, and Extended Kalman Filter for cube detection and 3D positioning.
Duration
Spring 2025
Role
Computer Vision Lead
Institution
NTNU
Status
Completed
Technologies Used
Overview
This AIS4104 group project (Part 2) involved building and programming an NVIDIA Kaya robot for autonomous cube manipulation. My contribution focused on developing the complete computer vision pipeline: training a YOLOv8 model for cube detection, implementing depth-based 3D positioning using Intel RealSense D435, and integrating an Extended Kalman Filter for stable state estimation. The vision system achieved accurate cube localization enabling the robot to autonomously approach and push cubes to designated positions.
Problem Statement
The Kaya robot needed to autonomously detect colored cubes in its environment, determine their 3D positions, and navigate to manipulate them. This required a robust perception system that could work in real-time with the ROS2-based motion planning and control systems developed by team members.
Challenges & Solutions
| Challenge | Solution | Outcome |
|---|---|---|
| Real-time Detection Performance | Trained YOLOv8-nano model on custom dataset for optimal speed/accuracy trade-off | Achieved real-time detection at 30+ FPS on Jetson Orin Nano |
| Noisy Depth Measurements | Implemented Extended Kalman Filter for state estimation and smoothing | Stable 3D position estimates with reduced jitter |
| Camera-Robot Coordinate Transform | Performed camera calibration and implemented proper coordinate transformations | Accurate cube positions in robot base frame for motion planning |