EmboFlow/design/08-decisions/adr-0001-raw-asset-and-canonical-dataset.md

46 lines
1.3 KiB
Markdown

# ADR-0001: Separate Raw Assets From Canonical Datasets
## Status
Accepted
## Context
EmboFlow must support both structured embodied dataset formats and unstructured or semi-structured delivery-style raw assets, including:
- RLDS
- LeRobot v2/v3
- HDF5
- Rosbag
- Raw video directories
- Archive packages
If the platform treats every input as an already-standardized dataset, ingestion and delivery workflows become awkward and lossy.
## Decision
The platform will model:
- Raw assets as first-class resources
- Canonical datasets as derived semantic resources
Raw assets preserve original structure, paths, naming, and metadata layout. Canonical datasets provide normalized semantics for conversion, workflow execution, and export logic.
## Consequences
### Positive
- Supports customer delivery package workflows
- Supports embodied dataset conversion workflows
- Preserves original structure for inspection and debugging
- Avoids forcing visualization to depend on a lossy normalized format
### Negative
- Adds one more layer to the object model
- Requires readers and mappers instead of direct format-to-format conversion
## Notes
Visualization may operate on raw assets directly. Processing and export should primarily operate on canonical semantics where possible.