81 lines
2.9 KiB
Markdown
81 lines
2.9 KiB
Markdown
# EmboFlow Platform Overview
|
|
|
|
## Positioning
|
|
|
|
EmboFlow is a browser-based embodied data engineering platform for ingesting raw assets, organizing dataset workflows on a visual canvas, processing and converting data, annotating and inspecting results, exporting normalized artifacts, and generating downstream training configurations.
|
|
|
|
The platform is designed around plugin-based extensibility, but the first version should deliver a stable built-in core before opening broader extension surfaces.
|
|
|
|
The current V1 implementation exposes that core through four first-class product objects:
|
|
|
|
- `Project`
|
|
- `Asset`
|
|
- `Dataset`
|
|
- `WorkflowTemplate`
|
|
|
|
## Primary Users
|
|
|
|
- Individual engineers building embodied datasets
|
|
- Team operators managing collection, preprocessing, delivery, and annotation workflows
|
|
- Data engineering teams that need repeatable conversion and packaging pipelines
|
|
- Teams preparing datasets for external training systems
|
|
|
|
## V1 Product Goal
|
|
|
|
Build a usable end-to-end platform that allows users to:
|
|
|
|
1. Log into a personal or team workspace
|
|
2. Create a project
|
|
3. Configure project storage connections for local paths or object storage
|
|
4. Upload or import raw embodied data assets
|
|
5. Derive reusable datasets from project assets
|
|
6. Auto-detect asset structure and generate preview summaries
|
|
7. Start a workflow from a reusable template or compose one from a blank canvas
|
|
8. Configure node parameters and inject code into processing nodes
|
|
9. Execute workflows asynchronously and inspect logs and outputs
|
|
10. Export normalized delivery packages, training datasets, or training config files
|
|
|
|
## Supported Input Formats in V1
|
|
|
|
- RLDS
|
|
- LeRobot v2/v3
|
|
- HDF5
|
|
- Rosbag
|
|
- Raw video folders and delivery-style directory packages
|
|
- Compressed archives containing the above
|
|
|
|
## Core Product Principles
|
|
|
|
- Raw assets are first-class objects
|
|
- Canonical semantic datasets are derived, not assumed
|
|
- Visualization can operate directly on raw assets
|
|
- Workflow execution is asynchronous and traceable
|
|
- Plugins are versioned and managed
|
|
- User-injected code is supported with strict runtime boundaries
|
|
- Training execution is out of scope for V1, but training handoff is in scope
|
|
|
|
## Major Workspaces
|
|
|
|
- Project Workspace: create and switch project contexts
|
|
- Asset Workspace: upload, import, scan, probe, browse
|
|
- Canvas Workspace: build and run workflows
|
|
- Explore Workspace: inspect raw assets and processed outputs
|
|
- Label Workspace: create and review annotation tasks
|
|
- Admin Workspace: users, workspaces, plugins, storage, runtime settings
|
|
|
|
## V1 Output Types
|
|
|
|
- Standardized embodied dataset exports
|
|
- Customer delivery packages
|
|
- Validation and quality reports
|
|
- Annotation artifacts
|
|
- Training configuration packages for downstream training systems
|
|
|
|
## Non-Goals for V1
|
|
|
|
- Built-in training execution orchestration
|
|
- Real-time collaborative editing on the same canvas
|
|
- Public plugin marketplace
|
|
- Fully generalized MLOps lifecycle management
|
|
- Advanced distributed scheduling in the first deployment
|