EmboFlow/design/00-overview/emboflow-platform-overview.md

# EmboFlow Platform Overview

## Positioning

EmboFlow is a browser-based embodied data engineering platform for ingesting raw assets, organizing dataset workflows on a visual canvas, processing and converting data, annotating and inspecting results, exporting normalized artifacts, and generating downstream training configurations.

The platform is designed around plugin-based extensibility, but the first version should deliver a stable built-in core before opening broader extension surfaces.

The current V1 implementation exposes that core through four first-class product objects:

- `Project`
- `Asset`
- `Dataset`
- `WorkflowTemplate`

## Primary Users

- Individual engineers building embodied datasets
- Team operators managing collection, preprocessing, delivery, and annotation workflows
- Data engineering teams that need repeatable conversion and packaging pipelines
- Teams preparing datasets for external training systems

## V1 Product Goal

Build a usable end-to-end platform that allows users to:

1. Log into a personal or team workspace
2. Create a project
3. Configure project storage connections for local paths or object storage
4. Upload or import raw embodied data assets
5. Derive reusable datasets from project assets
6. Auto-detect asset structure and generate preview summaries
7. Start a workflow from a reusable template or compose one from a blank canvas
8. Configure node parameters and inject code into processing nodes
9. Execute workflows asynchronously and inspect logs and outputs
10. Export normalized delivery packages, training datasets, or training config files

## Supported Input Formats in V1

- RLDS
- LeRobot v2/v3
- HDF5
- Rosbag
- Raw video folders and delivery-style directory packages
- Compressed archives containing the above

## Core Product Principles

- Raw assets are first-class objects
- Canonical semantic datasets are derived, not assumed
- Visualization can operate directly on raw assets
- Workflow execution is asynchronous and traceable
- Plugins are versioned and managed
- User-injected code is supported with strict runtime boundaries
- Training execution is out of scope for V1, but training handoff is in scope

## Major Workspaces

- Project Workspace: create and switch project contexts
- Asset Workspace: upload, import, scan, probe, browse
- Canvas Workspace: build and run workflows
- Explore Workspace: inspect raw assets and processed outputs
- Label Workspace: create and review annotation tasks
- Admin Workspace: users, workspaces, plugins, storage, runtime settings

## V1 Output Types

- Standardized embodied dataset exports
- Customer delivery packages
- Validation and quality reports
- Annotation artifacts
- Training configuration packages for downstream training systems

## Non-Goals for V1

- Built-in training execution orchestration
- Real-time collaborative editing on the same canvas
- Public plugin marketplace
- Fully generalized MLOps lifecycle management
- Advanced distributed scheduling in the first deployment