Ainex-vision-nav

I built a safety-first navigation demo for an Ainex humanoid where the robot captures a camera frame, sends it to a vision-language model for a high-level decision, then executes one small, controlled motion primitive (forward/turn/stop) via ROS. The key idea is separating responsibilities: the AI does reasoning, while the robot controller handles execution stability. This keeps behavior predictable and makes debugging much easier.


Project Sections

Overview

This project explores vision-based navigation for an Ainex humanoid robot using ROS (Noetic). The robot captures images onboard, sends them to a vision-language decision module (currently via OpenAI), then executes one safe, discrete motion primitive at a time through ROS. The result is a full autonomy loop—see → decide → move → see again—with optional user prompts at ambiguous intersections. Conducted as research under Professor Myung (Michael) Cho.

Project Workflow:

    System Architecture:

    Capture: save a reliable frame from /camera/image_rect_color (or fallback topics if needed).

    • Decide: model returns a strict JSON action like:
      • MOVE_FORWARD_SHORT
      • TURN_LEFT_90
      • TURN_RIGHT_90
      • STOP
      • ASK_USER("Left or right?")
    • Execute: run one motion primitive through the Ainex walking stack (ROS topic/service).
    • Repeat: loop until stopped.

    Design principle

    • The "brain" makes high-level decisions only.
    • Low-level walking stability remains inside the existing Ainex controller.
    • Default behavior is conservative: when uncertain → STOP or ASK_USER.

    What I implemented

  • Motion primitives as scripts: consistent, single-purpose commands (forward step, 90° turns, stop).
    • Reliable camera capture: one-liner scripts to create captures/capture.jpg every time.
    • Vision-language decision module:
      • Strict JSON output parsing
      • Defensive fallbacks to STOP
      • Environment-based config (model, tokens, temperature)
    • Demo modes:
      • Single-step decision + execute
      • Closed-loop demo loop (capture → decide → execute → repeat)
      • DRY_RUN mode to test decisions without moving hardware

    Results / demo behavior

    Robot can repeatedly:

    • Capture frames and perform inference
    • Select a discrete action from a constrained action set
    • Execute a safe step and re-evaluate the environment
    • System supports "human-in-the-loop" navigation: at uncertain junctions, the model can request user input rather than guessing

    Key challenges (and what I learned)

    • ROS environment issues: mismatched ROS_MASTER_URI, missing nodes, and services not reachable. Learned fast debugging patterns for ROS networks and startup order.
    • Walking drift / curvature: even small asymmetries in gait parameters can cause arc turns. Learned to isolate the problem: decision vs execution vs gait tuning.
    • Reliability matters more than cleverness: the best demo is the one that runs 20 times in a row.

Tech stack

ROS Noetic

  • Ainex walking stack (topics/services)
  • Python (decision loop + OpenAI client)
  • Bash scripts (motion/capture primitives)
  • Windows dev workflow (GitHub app + sync scripts to robot)


Project Media

Test

This is me testing the camera functionality

Image 2

This is a screenshot of the camera test command line interface

This is a demo of the navigation system in action

Safety considerations

  • Single-step execution (no continuous uncontrolled walking)
  • Conservative defaults: STOP if parsing fails or confidence is low
  • Optional user confirmation at ambiguous scenes
  • Clear separation between decision and actuation layers

Future work

  • Run the "brain" locally on-device (replace API calls with an onboard model)
  • Improve gait stability and reduce drift (calibration + parameter tuning + feedback)
  • Add simple state (e.g., "if we just turned left, avoid turning left again immediately")
  • Add basic obstacle safety checks (e-stop, distance thresholds, or vision heuristics)

Repository Link

Explore the code and Data in the GitHub repository: GitHub - Ainex Vision Navigation