2.9 KiB
2.9 KiB
EmboFlow Platform Overview
Positioning
EmboFlow is a browser-based embodied data engineering platform for ingesting raw assets, organizing dataset workflows on a visual canvas, processing and converting data, annotating and inspecting results, exporting normalized artifacts, and generating downstream training configurations.
The platform is designed around plugin-based extensibility, but the first version should deliver a stable built-in core before opening broader extension surfaces.
The current V1 implementation exposes that core through four first-class product objects:
ProjectAssetDatasetWorkflowTemplate
Primary Users
- Individual engineers building embodied datasets
- Team operators managing collection, preprocessing, delivery, and annotation workflows
- Data engineering teams that need repeatable conversion and packaging pipelines
- Teams preparing datasets for external training systems
V1 Product Goal
Build a usable end-to-end platform that allows users to:
- Log into a personal or team workspace
- Create a project
- Configure project storage connections for local paths or object storage
- Upload or import raw embodied data assets
- Derive reusable datasets from project assets
- Auto-detect asset structure and generate preview summaries
- Start a workflow from a reusable template or compose one from a blank canvas
- Configure node parameters and inject code into processing nodes
- Execute workflows asynchronously and inspect logs and outputs
- Export normalized delivery packages, training datasets, or training config files
Supported Input Formats in V1
- RLDS
- LeRobot v2/v3
- HDF5
- Rosbag
- Raw video folders and delivery-style directory packages
- Compressed archives containing the above
Core Product Principles
- Raw assets are first-class objects
- Canonical semantic datasets are derived, not assumed
- Visualization can operate directly on raw assets
- Workflow execution is asynchronous and traceable
- Plugins are versioned and managed
- User-injected code is supported with strict runtime boundaries
- Training execution is out of scope for V1, but training handoff is in scope
Major Workspaces
- Project Workspace: create and switch project contexts
- Asset Workspace: upload, import, scan, probe, browse
- Canvas Workspace: build and run workflows
- Explore Workspace: inspect raw assets and processed outputs
- Label Workspace: create and review annotation tasks
- Admin Workspace: users, workspaces, plugins, storage, runtime settings
V1 Output Types
- Standardized embodied dataset exports
- Customer delivery packages
- Validation and quality reports
- Annotation artifacts
- Training configuration packages for downstream training systems
Non-Goals for V1
- Built-in training execution orchestration
- Real-time collaborative editing on the same canvas
- Public plugin marketplace
- Fully generalized MLOps lifecycle management
- Advanced distributed scheduling in the first deployment