EmboFlow/design/08-decisions/adr-0001-raw-asset-and-canonical-dataset.md

1.3 KiB

ADR-0001: Separate Raw Assets From Canonical Datasets

Status

Accepted

Context

EmboFlow must support both structured embodied dataset formats and unstructured or semi-structured delivery-style raw assets, including:

  • RLDS
  • LeRobot v2/v3
  • HDF5
  • Rosbag
  • Raw video directories
  • Archive packages

If the platform treats every input as an already-standardized dataset, ingestion and delivery workflows become awkward and lossy.

Decision

The platform will model:

  • Raw assets as first-class resources
  • Canonical datasets as derived semantic resources

Raw assets preserve original structure, paths, naming, and metadata layout. Canonical datasets provide normalized semantics for conversion, workflow execution, and export logic.

Consequences

Positive

  • Supports customer delivery package workflows
  • Supports embodied dataset conversion workflows
  • Preserves original structure for inspection and debugging
  • Avoids forcing visualization to depend on a lossy normalized format

Negative

  • Adds one more layer to the object model
  • Requires readers and mappers instead of direct format-to-format conversion

Notes

Visualization may operate on raw assets directly. Processing and export should primarily operate on canonical semantics where possible.