Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts

TL;DR

Domain ARiThmetic (DART) adapts VLA models under environmental shifts in a one-shot manner through subspace-aligned weight arithmetic.

Abstract

Vision-Language-Action (VLA) models often fail to perform the same learned tasks under environmental shifts, such as changes in camera pose and shifts to a different but similar robot (e.g., from Panda to UR5e). Adapting these models to the shifted environment (i.e., target domain) often requires training on multiple demonstrations for each task, which are costly to collect.

To reduce the burden of data curation and training, we propose a simple analogy-based method that adapts VLA models under environmental shifts through weight vector arithmetic with domain-specific information addition, named Domain ARiThmetic (DART). Unlike prior approaches, DART requires collecting only a single demonstration, enabling efficient adaptation. To accurately isolate domain-specific information for addition, DART performs subspace alignment between singular components in weight vectors to filter out noisy components.

In both simulated and real-world experiments, DART outperforms existing VLA adaptation methods in one-shot scenarios across diverse visual and embodiment shifts.

Motivation

The Robustness Challenge in VLA Models

Why is adapting VLA policies to a new environment still expensive?

Vision-Language-Action (VLA) models can solve many learned tasks in their source environment, but camera-pose changes, sensor noise, or embodiment shifts can degrade performance.

DART teaser figure — Environmental shifts make a source-trained VLA model fail, while task-wise fine-tuning is costly and one-shot fine-tuning often does not transfer across tasks. DART adapts the base policy by extracting a reusable domain direction from a single demonstration.

Full-Data FT Is Costly

Adapting every task requires costly target-domain demonstrations.

One-Shot FT Overfits

It improves the demo task, but fails to generalize across tasks.

DART Transfers Domains

DART extracts domain knowledge from one demo and applies it across tasks.

Fine-tuned Weights Analysis

Task and Domain Directions in One-Shot Updates

One-shot updates mainly capture task behavior, but also contain reusable domain directions.

Additive alignment analysis — Task and domain prototypes compose additively to reconstruct one-shot update vectors, suggesting a linear structure in weight space.

Domain heatmap analysis — Similar shifts produce aligned directions, while composed shifts partially reuse directions from each shift.

Finding #1: Task-Domain Decomposition

Update vectors from the same task align strongly across domains, but they also contain consistent domain-shared components that can be separated and reused.

Finding #2: Composable Domain Directions

Viewpoint shifts, camera noise, lighting changes, and their combinations induce structured directions rather than arbitrary fine-tuning noise.

Method

Domain Arithmetic for One-Shot Adaptation

DART extracts a target-domain vector by subtracting shared task directions, then injects it into the base policy.

DART method overview — DART fine-tunes source and target one-shot models on the same task, subtracts task directions, filters noisy source components, and adds the refined domain vector to the base policy.

1

Fine-tune the base VLA model on one source-domain and one target-domain demonstration of the same task.

2

Compute source and target update vectors from the base model, then subtract the source update to cancel task-specific directions.

3

Use subspace filtering to subtract only aligned source components and suppress source-domain artifacts.

4

Scale the refined domain vector by alignment quality and add it to the base model for target-domain adaptation.

Results

Experimental Results

DART consistently improves one-shot adaptation in simulation and real-world robot experiments.

Novel Visual Domains

On LIBERO with π0.5, DART reaches 79.1% average success across Small, Medium, and Large viewpoint shifts, outperforming FLA and RETAIN.

Cross-Embodiment Transfer

On MimicGen, DART transfers a Panda-trained policy to UR5e and improves average success to 69.4% across Stack and Stack Three.

Real-World Adaptation

Using one Stack Cube target demonstration, DART achieves 81.7% average success across five UR10e real-world tasks.

DART experimental results — Performance on LIBERO tasks across diverse visual shifts, including novel viewpoints and combined camera noise / illumination changes with π0.5.

Stack task progress and success rate table — Performance on MimicGen Stack tasks under Panda-to-UR5e cross-embodiment transfer.

Novel viewpoint success rate table — Performance on real-world UR10e manipulation tasks under novel viewpoint adaptation.

BibTeX

@inproceedings{kang2026dart,
  author    = {Kang, Taewook and Kim, Taeheon and Shin, Donghyun and Choi, Jonghyun},
  title     = {Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts},
  booktitle = {ECCV},
  year      = {2026},
}