46 lines
1.3 KiB
Markdown
46 lines
1.3 KiB
Markdown
# ADR-0001: Separate Raw Assets From Canonical Datasets
|
|
|
|
## Status
|
|
|
|
Accepted
|
|
|
|
## Context
|
|
|
|
EmboFlow must support both structured embodied dataset formats and unstructured or semi-structured delivery-style raw assets, including:
|
|
|
|
- RLDS
|
|
- LeRobot v2/v3
|
|
- HDF5
|
|
- Rosbag
|
|
- Raw video directories
|
|
- Archive packages
|
|
|
|
If the platform treats every input as an already-standardized dataset, ingestion and delivery workflows become awkward and lossy.
|
|
|
|
## Decision
|
|
|
|
The platform will model:
|
|
|
|
- Raw assets as first-class resources
|
|
- Canonical datasets as derived semantic resources
|
|
|
|
Raw assets preserve original structure, paths, naming, and metadata layout. Canonical datasets provide normalized semantics for conversion, workflow execution, and export logic.
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- Supports customer delivery package workflows
|
|
- Supports embodied dataset conversion workflows
|
|
- Preserves original structure for inspection and debugging
|
|
- Avoids forcing visualization to depend on a lossy normalized format
|
|
|
|
### Negative
|
|
|
|
- Adds one more layer to the object model
|
|
- Requires readers and mappers instead of direct format-to-format conversion
|
|
|
|
## Notes
|
|
|
|
Visualization may operate on raw assets directly. Processing and export should primarily operate on canonical semantics where possible.
|