# ADR-0001: Separate Raw Assets From Canonical Datasets ## Status Accepted ## Context EmboFlow must support both structured embodied dataset formats and unstructured or semi-structured delivery-style raw assets, including: - RLDS - LeRobot v2/v3 - HDF5 - Rosbag - Raw video directories - Archive packages If the platform treats every input as an already-standardized dataset, ingestion and delivery workflows become awkward and lossy. ## Decision The platform will model: - Raw assets as first-class resources - Canonical datasets as derived semantic resources Raw assets preserve original structure, paths, naming, and metadata layout. Canonical datasets provide normalized semantics for conversion, workflow execution, and export logic. ## Consequences ### Positive - Supports customer delivery package workflows - Supports embodied dataset conversion workflows - Preserves original structure for inspection and debugging - Avoids forcing visualization to depend on a lossy normalized format ### Negative - Adds one more layer to the object model - Requires readers and mappers instead of direct format-to-format conversion ## Notes Visualization may operate on raw assets directly. Processing and export should primarily operate on canonical semantics where possible.