Ainex-vision-nav
I built a safety-first navigation demo for an Ainex humanoid where the robot captures a camera frame, sends it to a vision-language model for a high-level decision, then executes one small, controlled motion primitive (forward/turn/stop) via ROS. The key idea is separating responsibilities: the AI does reasoning, while the robot controller handles execution stability. This keeps behavior predictable and makes debugging much easier.
Project Sections
Overview
This project explores vision-based navigation for an Ainex humanoid robot using ROS (Noetic). The robot captures images onboard, sends them to a vision-language decision module (currently via OpenAI), then executes one safe, discrete motion primitive at a time through ROS. The result is a full autonomy loop—see → decide → move → see again—with optional user prompts at ambiguous intersections. Conducted as research under Professor Myung (Michael) Cho.
Project Workflow:
- Decide: model returns a strict JSON action like:
- MOVE_FORWARD_SHORT
- TURN_LEFT_90
- TURN_RIGHT_90
- STOP
- ASK_USER("Left or right?")
- Execute: run one motion primitive through the Ainex walking stack (ROS topic/service).
- Repeat: loop until stopped.
- The "brain" makes high-level decisions only.
- Low-level walking stability remains inside the existing Ainex controller.
- Default behavior is conservative: when uncertain → STOP or ASK_USER.
- Motion primitives as scripts: consistent, single-purpose commands (forward step, 90° turns, stop).
- Reliable camera capture: one-liner scripts to create captures/capture.jpg every time.
- Vision-language decision module:
- Strict JSON output parsing
- Defensive fallbacks to STOP
- Environment-based config (model, tokens, temperature)
- Demo modes:
- Single-step decision + execute
- Closed-loop demo loop (capture → decide → execute → repeat)
- DRY_RUN mode to test decisions without moving hardware
Results / demo behavior
Robot can repeatedly:
- Capture frames and perform inference
- Select a discrete action from a constrained action set
- Execute a safe step and re-evaluate the environment
- System supports "human-in-the-loop" navigation: at uncertain junctions, the model can request user input rather than guessing
Key challenges (and what I learned)
- ROS environment issues: mismatched ROS_MASTER_URI, missing nodes, and services not reachable. Learned fast debugging patterns for ROS networks and startup order.
- Walking drift / curvature: even small asymmetries in gait parameters can cause arc turns. Learned to isolate the problem: decision vs execution vs gait tuning.
- Reliability matters more than cleverness: the best demo is the one that runs 20 times in a row.
System Architecture:
Capture: save a reliable frame from /camera/image_rect_color (or fallback topics if needed).
Design principle
What I implemented
Tech stack
ROS Noetic
- Ainex walking stack (topics/services)
- Python (decision loop + OpenAI client)
- Bash scripts (motion/capture primitives)
- Windows dev workflow (GitHub app + sync scripts to robot)
Project Media
Test
This is me testing the camera functionality
This is a screenshot of the camera test command line interface
This is a demo of the navigation system in action
Safety considerations
- Single-step execution (no continuous uncontrolled walking)
- Conservative defaults: STOP if parsing fails or confidence is low
- Optional user confirmation at ambiguous scenes
- Clear separation between decision and actuation layers
Future work
- Run the "brain" locally on-device (replace API calls with an onboard model)
- Improve gait stability and reduce drift (calibration + parameter tuning + feedback)
- Add simple state (e.g., "if we just turned left, avoid turning left again immediately")
- Add basic obstacle safety checks (e-stop, distance thresholds, or vision heuristics)
Repository Link
Explore the code and Data in the GitHub repository: GitHub - Ainex Vision Navigation