317 lines
6.0 KiB
Markdown
317 lines
6.0 KiB
Markdown
# EmboFlow Workflow Execution Model
|
|
|
|
## Goal
|
|
|
|
Define how EmboFlow represents, validates, executes, and observes canvas workflows.
|
|
|
|
The workflow system is the product core. The canvas is only the editing surface. The real system of record is the versioned workflow definition and its immutable run snapshots.
|
|
|
|
## Core Objects
|
|
|
|
- `WorkflowDefinition`
|
|
Logical workflow identity under a project
|
|
- `WorkflowVersion`
|
|
Immutable snapshot of nodes, edges, runtime defaults, and plugin references
|
|
- `NodeInstance`
|
|
Concrete node on a workflow graph
|
|
- `WorkflowRun`
|
|
One execution of one workflow version
|
|
- `RunTask`
|
|
Executable unit derived from a node during one run
|
|
- `Artifact`
|
|
Managed output from a task or run
|
|
|
|
## Workflow Layers
|
|
|
|
Each workflow version contains three layers.
|
|
|
|
### Visual Layer
|
|
|
|
Used only by the editor:
|
|
|
|
- node positions
|
|
- collapsed state
|
|
- groups
|
|
- zoom defaults
|
|
- viewport metadata
|
|
|
|
### Logic Layer
|
|
|
|
Used for graph semantics:
|
|
|
|
- nodes
|
|
- edges
|
|
- input/output ports
|
|
- branch conditions
|
|
- merge semantics
|
|
- dependency graph
|
|
|
|
### Runtime Layer
|
|
|
|
Used for execution:
|
|
|
|
- node config values
|
|
- executor settings
|
|
- runtime resource limits
|
|
- retry policy
|
|
- code hooks
|
|
- cache policy
|
|
|
|
Visual changes must not change workflow semantics. Runtime changes must produce a new workflow version.
|
|
|
|
## Node Categories
|
|
|
|
V1 node categories:
|
|
|
|
- `Source`
|
|
- `Transform`
|
|
- `Inspect`
|
|
- `Annotate`
|
|
- `Export`
|
|
- `Utility`
|
|
|
|
### V1 Built-In Node Families
|
|
|
|
- asset upload/import
|
|
- archive extract
|
|
- folder rename
|
|
- directory validation
|
|
- metadata validation
|
|
- video quality inspection
|
|
- dataset readers for RLDS, LeRobot, HDF5, Rosbag
|
|
- canonical mapping nodes
|
|
- dataset writers and exporters
|
|
- training config export
|
|
- Python processing node
|
|
|
|
## Node Definition Contract
|
|
|
|
Each node definition must expose:
|
|
|
|
- `id`
|
|
- `name`
|
|
- `category`
|
|
- `version`
|
|
- `description`
|
|
- `inputSchema`
|
|
- `outputSchema`
|
|
- `configSchema`
|
|
- `uiSchema`
|
|
- `executorType`
|
|
- `runtimeDefaults`
|
|
- `permissions`
|
|
- `capabilities`
|
|
- `codeHookSpec`
|
|
|
|
### Code Hook Spec
|
|
|
|
V1 supports user code hooks only on:
|
|
|
|
- `Transform`
|
|
- `Inspect`
|
|
- `Utility`
|
|
|
|
Hooks must use a constrained entrypoint instead of arbitrary script structure.
|
|
|
|
Example:
|
|
|
|
```python
|
|
def process(input_data, context):
|
|
return input_data
|
|
```
|
|
|
|
This keeps serialization, logging, and runtime control predictable.
|
|
|
|
## Data Flow Contract
|
|
|
|
Tasks should exchange managed references, not loose file paths.
|
|
|
|
V1 reference types:
|
|
|
|
- `assetRef`
|
|
- `datasetVersionRef`
|
|
- `artifactRef`
|
|
- `annotationTaskRef`
|
|
- `inlineConfig`
|
|
|
|
Executors may materialize files internally, but the platform-level contract must remain reference-based.
|
|
|
|
## Validation Stages
|
|
|
|
Workflow execution must validate in this order:
|
|
|
|
1. workflow version exists
|
|
2. referenced plugins exist and are enabled
|
|
3. node schemas are valid
|
|
4. edge connections are schema-compatible
|
|
5. runtime configuration is complete
|
|
6. referenced assets and datasets are accessible
|
|
7. code hooks pass static validation
|
|
8. executor and scheduler requirements are satisfiable
|
|
|
|
Validation failure must block run creation.
|
|
|
|
## Run Lifecycle
|
|
|
|
When a user executes a workflow:
|
|
|
|
1. resolve workflow version
|
|
2. snapshot all runtime-relevant inputs
|
|
3. resolve plugin versions
|
|
4. freeze node config and code hooks
|
|
5. compile graph into a DAG
|
|
6. create `WorkflowRun`
|
|
7. create `RunTask` entries
|
|
8. enqueue ready tasks
|
|
9. collect outputs, logs, and task state
|
|
10. finalize run status and summary
|
|
|
|
## Run State Model
|
|
|
|
### WorkflowRun Status
|
|
|
|
- `pending`
|
|
- `queued`
|
|
- `running`
|
|
- `success`
|
|
- `failed`
|
|
- `cancelled`
|
|
- `partial_success`
|
|
|
|
### RunTask Status
|
|
|
|
- `pending`
|
|
- `queued`
|
|
- `running`
|
|
- `success`
|
|
- `failed`
|
|
- `cancelled`
|
|
- `skipped`
|
|
|
|
`partial_success` is used for workflows where non-blocking nodes fail but the run still produces valid outputs.
|
|
|
|
## Retry And Failure Policy
|
|
|
|
Each node instance may define:
|
|
|
|
- retry count
|
|
- retry backoff policy
|
|
- fail-fast behavior
|
|
- continue-on-error behavior
|
|
- manual retry eligibility
|
|
|
|
V1 should support:
|
|
|
|
- `fail_fast`
|
|
- `continue_on_error`
|
|
- `retry_n_times`
|
|
- `manual_retry`
|
|
|
|
## Cache Model
|
|
|
|
V1 should support node-level cache reuse.
|
|
|
|
Recommended cache key inputs:
|
|
|
|
- workflow version
|
|
- node id
|
|
- upstream reference summary
|
|
- config summary
|
|
- code hook digest
|
|
- plugin version
|
|
- executor version
|
|
|
|
Cache hit behavior:
|
|
|
|
- reuse output artifact refs
|
|
- reuse output summaries
|
|
- retain previous logs reference
|
|
- mark task as cache-resolved in metadata
|
|
|
|
## Execution Context
|
|
|
|
Each task receives a normalized execution context containing:
|
|
|
|
- workspace id
|
|
- project id
|
|
- workflow run id
|
|
- task id
|
|
- actor id
|
|
- node config
|
|
- code hook content
|
|
- input references
|
|
- storage context
|
|
- temp working directory
|
|
- runtime resource limits
|
|
|
|
This context must be available across Python, Docker, and HTTP executors.
|
|
|
|
## Observability Requirements
|
|
|
|
Each task must emit:
|
|
|
|
- status transitions
|
|
- start time and finish time
|
|
- duration
|
|
- executor metadata
|
|
- resource request metadata
|
|
- stdout/stderr log stream
|
|
- structured task summary
|
|
- artifact refs
|
|
|
|
The UI must allow:
|
|
|
|
- graph-level run status
|
|
- node-level log inspection
|
|
- node-level artifact browsing
|
|
- task retry entrypoint
|
|
- direct navigation from a node to preview output
|
|
|
|
## Canvas Interaction Rules
|
|
|
|
V1 editor behavior should enforce:
|
|
|
|
- port-level connection rules
|
|
- incompatible edge blocking
|
|
- dirty-state detection
|
|
- explicit save before publish/run if graph changed
|
|
- per-node validation badges
|
|
- run from latest saved version, not unsaved draft
|
|
|
|
## Example V1 Pipelines
|
|
|
|
### Delivery Normalization
|
|
|
|
```text
|
|
Raw Folder Import
|
|
-> Archive Extract
|
|
-> Folder Rename
|
|
-> Directory Validation
|
|
-> Metadata Validation
|
|
-> Video Quality Check
|
|
-> Delivery Export
|
|
```
|
|
|
|
### Dataset Conversion
|
|
|
|
```text
|
|
Rosbag Reader
|
|
-> Canonical Mapping
|
|
-> Frame Filter
|
|
-> Metadata Normalize
|
|
-> LeRobot Writer
|
|
-> Training Config Export
|
|
```
|
|
|
|
## V1 Non-Goals
|
|
|
|
The V1 workflow engine does not need:
|
|
|
|
- loop semantics
|
|
- streaming execution
|
|
- unbounded dynamic fan-out
|
|
- event-driven triggers
|
|
- advanced distributed DAG partitioning
|
|
|
|
The V1 goal is a stable, observable DAG executor for data engineering workflows.
|