Ainex-vision-nav

I built a safety-first navigation demo for an Ainex humanoid where the robot captures a camera frame, sends it to a vision-language model for a high-level decision, then executes one small, controlled motion primitive (forward/turn/stop) via ROS. The key idea is separating responsibilities: the AI does reasoning, while the robot controller handles execution stability. This keeps behavior predictable and makes debugging much easier.

Project Sections

Overview

This project explores vision-based navigation for an Ainex humanoid robot using ROS (Noetic). The robot captures images onboard, sends them to a vision-language decision module (currently via OpenAI), then executes one safe, discrete motion primitive at a time through ROS. The result is a full autonomy loop—see → decide → move → see again—with optional user prompts at ambiguous intersections. Conducted as research under Professor Myung (Michael) Cho.

Project Workflow:

System Architecture:

Capture: save a reliable frame from /camera/image_rect_color (or fallback topics if needed).

Decide: model returns a strict JSON action like:
- MOVE_FORWARD_SHORT
- TURN_LEFT_90
- TURN_RIGHT_90
- STOP
- ASK_USER("Left or right?")
Execute: run one motion primitive through the Ainex walking stack (ROS topic/service).
Repeat: loop until stopped.

Design principle

The "brain" makes high-level decisions only.
Low-level walking stability remains inside the existing Ainex controller.
Default behavior is conservative: when uncertain → STOP or ASK_USER.

What I implemented

Motion primitives as scripts: consistent, single-purpose commands (forward step, 90° turns, stop).
- Reliable camera capture: one-liner scripts to create captures/capture.jpg every time.
- Vision-language decision module:
  - Strict JSON output parsing
  - Defensive fallbacks to STOP
  - Environment-based config (model, tokens, temperature)
- Demo modes:
  - Single-step decision + execute
  - Closed-loop demo loop (capture → decide → execute → repeat)
  - DRY_RUN mode to test decisions without moving hardware
Results / demo behavior

Robot can repeatedly:
- Capture frames and perform inference
- Select a discrete action from a constrained action set
- Execute a safe step and re-evaluate the environment
- System supports "human-in-the-loop" navigation: at uncertain junctions, the model can request user input rather than guessing
Key challenges (and what I learned)
- ROS environment issues: mismatched ROS_MASTER_URI, missing nodes, and services not reachable. Learned fast debugging patterns for ROS networks and startup order.
- Walking drift / curvature: even small asymmetries in gait parameters can cause arc turns. Learned to isolate the problem: decision vs execution vs gait tuning.
- Reliability matters more than cleverness: the best demo is the one that runs 20 times in a row.

Tech stack

ROS Noetic

Ainex walking stack (topics/services)
Python (decision loop + OpenAI client)
Bash scripts (motion/capture primitives)
Windows dev workflow (GitHub app + sync scripts to robot)

Project Media

Test

This is me testing the camera functionality

This is a screenshot of the camera test command line interface

This is a demo of the navigation system in action

Safety considerations

Single-step execution (no continuous uncontrolled walking)
Conservative defaults: STOP if parsing fails or confidence is low
Optional user confirmation at ambiguous scenes
Clear separation between decision and actuation layers

Future work

Run the "brain" locally on-device (replace API calls with an onboard model)
Improve gait stability and reduce drift (calibration + parameter tuning + feedback)
Add simple state (e.g., "if we just turned left, avoid turning left again immediately")
Add basic obstacle safety checks (e-stop, distance thresholds, or vision heuristics)

Repository Link

Explore the code and Data in the GitHub repository: GitHub - Ainex Vision Navigation