NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation


Computer Science > Artificial Intelligence


arXiv:2606.18271 (cs)


Submitted on 5 Jun 2026


Authors: Juan Manuel Delfa Victoria, Taran Cyriac John, Andrew W. Herson


Abstract


Earth Observation data generation now outpaces downlink bandwidth and manual processing, creating a growing gap between onboard data collection and actionable ground-level intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. The system uses a local vision-language model, Gemma 3, to classify each captured scene, generate a text description of its content and inter-feature relationships, and respond to follow-up operator queries via natural-language dialogue. Unlike conventional command sequences, NAVI-Orbital is re-tasked through plain-English prompts and orchestrated by a graph-based state machine (LangGraph) that coordinates dedicated agents for detection and dialogue. Results from ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live on-orbit captures of newly acquired, previously unseen Earth imagery—including uncorrected YAM-9 images processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument—demonstrate the feasibility of deploying foundation models on satellite-class edge computers. This approach inverts the traditional "acquire-then-downlink-everything" bandwidth profile by performing semantic compression of Earth observations in orbit.


Additional Information


via ArXiv AI

Related