EmboFlow Platform Overview

Positioning

EmboFlow is a browser-based embodied data engineering platform for ingesting raw assets, organizing dataset workflows on a visual canvas, processing and converting data, annotating and inspecting results, exporting normalized artifacts, and generating downstream training configurations.

The platform is designed around plugin-based extensibility, but the first version should deliver a stable built-in core before opening broader extension surfaces.

The current V1 implementation exposes that core through four first-class product objects:

Project
Asset
Dataset
WorkflowTemplate

Primary Users

Individual engineers building embodied datasets
Team operators managing collection, preprocessing, delivery, and annotation workflows
Data engineering teams that need repeatable conversion and packaging pipelines
Teams preparing datasets for external training systems

V1 Product Goal

Build a usable end-to-end platform that allows users to:

Log into a personal or team workspace
Create a project
Configure project storage connections for local paths or object storage
Upload or import raw embodied data assets
Derive reusable datasets from project assets
Auto-detect asset structure and generate preview summaries
Start a workflow from a reusable template or compose one from a blank canvas
Configure node parameters and inject code into processing nodes
Execute workflows asynchronously and inspect logs and outputs
Export normalized delivery packages, training datasets, or training config files

Supported Input Formats in V1

RLDS
LeRobot v2/v3
HDF5
Rosbag
Raw video folders and delivery-style directory packages
Compressed archives containing the above

Core Product Principles

Raw assets are first-class objects
Canonical semantic datasets are derived, not assumed
Visualization can operate directly on raw assets
Workflow execution is asynchronous and traceable
Plugins are versioned and managed
User-injected code is supported with strict runtime boundaries
Training execution is out of scope for V1, but training handoff is in scope

Major Workspaces

Project Workspace: create and switch project contexts
Asset Workspace: upload, import, scan, probe, browse
Canvas Workspace: build and run workflows
Explore Workspace: inspect raw assets and processed outputs
Label Workspace: create and review annotation tasks
Admin Workspace: users, workspaces, plugins, storage, runtime settings

V1 Output Types

Standardized embodied dataset exports
Customer delivery packages
Validation and quality reports
Annotation artifacts
Training configuration packages for downstream training systems

Non-Goals for V1

Built-in training execution orchestration
Real-time collaborative editing on the same canvas
Public plugin marketplace
Fully generalized MLOps lifecycle management
Advanced distributed scheduling in the first deployment

2.9 KiB Raw Blame History