# EmboFlow Platform Overview

## Positioning

EmboFlow is a browser-based embodied data engineering platform for ingesting raw assets, organizing dataset workflows on a visual canvas, processing and converting data, annotating and inspecting results, exporting normalized artifacts, and generating downstream training configurations.

The platform is designed around plugin-based extensibility, but the first version should deliver a stable built-in core before opening broader extension surfaces.

## Primary Users

- Individual engineers building embodied datasets
- Team operators managing collection, preprocessing, delivery, and annotation workflows
- Data engineering teams that need repeatable conversion and packaging pipelines
- Teams preparing datasets for external training systems

## V1 Product Goal

Build a usable end-to-end platform that allows users to:

1. Log into a personal or team workspace
2. Create a project
3. Upload or import raw embodied data assets
4. Auto-detect asset structure and generate preview summaries
5. Compose processing pipelines on a canvas
6. Configure node parameters and inject code into processing nodes
7. Execute workflows asynchronously and inspect logs and outputs
8. Export normalized delivery packages, training datasets, or training config files

## Supported Input Formats in V1

- RLDS
- LeRobot v2/v3
- HDF5
- Rosbag
- Raw video folders and delivery-style directory packages
- Compressed archives containing the above

## Core Product Principles

- Raw assets are first-class objects
- Canonical semantic datasets are derived, not assumed
- Visualization can operate directly on raw assets
- Workflow execution is asynchronous and traceable
- Plugins are versioned and managed
- User-injected code is supported with strict runtime boundaries
- Training execution is out of scope for V1, but training handoff is in scope

## Major Workspaces

- Asset Workspace: upload, import, scan, probe, browse
- Canvas Workspace: build and run workflows
- Explore Workspace: inspect raw assets and processed outputs
- Label Workspace: create and review annotation tasks
- Admin Workspace: users, workspaces, plugins, storage, runtime settings

## V1 Output Types

- Standardized embodied dataset exports
- Customer delivery packages
- Validation and quality reports
- Annotation artifacts
- Training configuration packages for downstream training systems

## Non-Goals for V1

- Built-in training execution orchestration
- Real-time collaborative editing on the same canvas
- Public plugin marketplace
- Fully generalized MLOps lifecycle management
- Advanced distributed scheduling in the first deployment