eust-w 7d7cd14233
Some checks failed
Guardrails / repository-guardrails (push) Has been cancelled
feat: add dataset-aware workflow inputs
2026-03-30 14:18:57 +08:00
2026-03-30 14:18:57 +08:00
2026-03-30 14:18:57 +08:00

EmboFlow

EmboFlow is a B/S embodied-data workflow platform for raw asset ingestion, delivery normalization, dataset transformation, workflow execution, preview, and export.

Current V1 Features

  • Project-scoped workspace shell with a dedicated Projects page and active project selector in the header
  • Asset workspace that supports local asset registration, probe summaries, storage connection management, and dataset creation
  • Project-scoped custom node registry with Docker image and Dockerfile based node definitions
  • Workflow templates as first-class objects, including default project templates and creating project workflows from a template
  • Blank workflow creation and a large React Flow editor with drag-and-drop nodes, free canvas movement, edge validation, Docker-first node runtime presets, and Python code-hook injection
  • Workflow-level Save As Template so edited graphs can be promoted into reusable project templates
  • Mongo-backed run orchestration, worker execution, run history, task detail, logs, stdout/stderr, artifacts, cancel, retry, and task retry
  • Runtime shell level Chinese and English switching

Bootstrap

From the repository root:

make bootstrap

This installs workspace dependencies and runs scripts/install_hooks.sh so local commit and push guardrails are active.

Local Commands

Run the full repository test suite:

make test

Run the strict repository guardrails:

make guardrails

Start package-level development entrypoints:

make dev-api
make dev-web
make dev-worker

Local Deployment

Start MongoDB and MinIO:

make infra-up

Start the API and web app in separate terminals:

make serve-api
make serve-web
make serve-worker

The default local stack uses:

  • API: http://127.0.0.1:3001
  • Web: http://127.0.0.1:3000
  • Worker: Mongo polling loop with WORKER_POLL_INTERVAL_MS=1000

Local Data Validation

The local validation path currently used for embodied data testing is:

/Users/longtaowu/workspace/emboldata/data

You can register that directory from the Assets page or via POST /api/assets/register. The workflow editor now supports workflow input bindings for both registered assets and project datasets. Dataset bindings are expanded into runnable asset ids during preflight and run creation, and run detail shows input sources, input assets, and input datasets separately. The editor now also persists per-node runtime config in workflow versions, including executor overrides, optional artifact title overrides, and Python code-hook source for inspect and transform style nodes. The runtime web shell now exposes a visible 中文 / English language toggle. The core workspace shell and workflow authoring surface are translated through a lightweight i18n layer. The shell now also exposes a dedicated Projects page plus an active project selector, so assets, datasets, workflow templates, workflows, and runs all switch together at the project boundary. The Assets workspace now includes first-class storage connections and datasets. A dataset is distinct from a raw asset and binds project source assets to a selected local or object-storage-backed destination. The shell now also exposes a dedicated Nodes page for project-scoped custom container nodes. Custom nodes can be registered from an existing Docker image or a self-contained Dockerfile, and each node declares whether it consumes a single asset set or multiple upstream asset sets plus what kind of output it produces. The Workflows workspace now includes a template gallery. Projects can start from default or saved templates, or create a blank workflow directly. The workflow editor center panel now uses a real draggable node canvas with zoom, pan, mini-map, dotted background, handle-based edge creation, persisted node positions, and localized validation feedback instead of a static list of node cards. The workflow editor right panel now also supports saving the current workflow draft as a reusable workflow template, in addition to editing per-node runtime settings and Python hooks. When a custom node is selected on the canvas, the right panel now also exposes its declared input contract, output contract, artifact type, and container source so the operator can confirm compatibility without leaving the editor. The workflow editor now also exposes a workflow-level preflight panel. Saved workflow versions can be checked against the selected asset or dataset binding before execution, and run creation is blocked when the current version still has graph, executor, or input-binding errors. The node library now supports both click-to-append and drag-and-drop placement into the canvas. When a node is inserted from the library, the editor now seeds its default runtime contract directly into the workflow draft, so custom Docker nodes keep their declared executor type and I/O contract without extra manual edits. V1 connection rules block self-edges, duplicate edges, cycles, incoming edges into source nodes, outgoing edges from export nodes, and multiple upstream edges into ordinary nodes, while allowing multi-input set nodes such as union-assets, intersect-assets, and difference-assets plus any custom node whose runtime contract declares inputMode=multi_asset_set. The Runs workspace now shows project-scoped run history, run-level aggregated summaries, cancel/retry controls, and run detail views with persisted task summaries, stdout/stderr sections, result previews, artifact links into Explore, plus explicit input-source visibility for both assets and datasets. Selected run tasks now expose the frozen node definition id, executor config snapshot, and code-hook metadata that were captured when the run was created. Most built-in delivery nodes now default to executorType=docker. When a node uses executorType=docker and provides executorConfig.image, the worker runs a real local Docker container with mounted input.json / output.json exchange files plus read-only mounts for bound asset paths. If no image is configured, the executor falls back to the lightweight simulated behavior used by older demo tasks. The Docker runner now treats missing or null codeHookSpec values as “no hook configured”, so built-in Docker nodes and custom container nodes can share the same task envelope without crashing on optional hook fields. Custom Docker nodes follow the same runtime contract. The container reads the task snapshot and execution context from EMBOFLOW_INPUT_PATH, writes {\"result\": ...} JSON to EMBOFLOW_OUTPUT_PATH, and if it declares an asset-set output contract it must return result.assetIds as a string array. Dockerfile-based custom nodes are built locally on first execution and then reused by tag. The Nodes page and API now share the same validation rules, including required names, valid source kinds, a mandatory FROM instruction for Dockerfiles, and rejection of Source category nodes that incorrectly declare inputMode=multi_asset_set. The editor also renders the standard EmboFlow input and output envelope preview for custom nodes so users can align container code to the actual runtime JSON shape. When a node uses the built-in Python path without a custom hook, source-asset now emits bound asset metadata from Mongo-backed asset records and validate-structure now performs a real directory validation pass against local source paths. On the current sample path /Users/longtaowu/workspace/emboldata/data, that validation reports valid=false, videoFileCount=407, and missing delivery files because the sample root is a mixed dataset collection rather than a delivery package. The worker now also carries direct upstream task results into execution context so set-operation utility nodes can compute narrowed asset sets and pass those effective asset ids to downstream tasks.

Repository Structure

  • apps/api contains the control-plane modules for workspaces, assets, workflows, runs, and artifacts.
  • apps/web contains the React shell, asset workspace, workflow editor surface, run detail view, and explore renderers.
  • apps/worker contains the Mongo-backed worker runtime, task runner, and executor contracts.
  • design/ contains the architecture and product design documents that must stay aligned with implementation.
  • docs/ contains workflow guidance and the executable implementation plan.

Developer Workflow

  1. Read the relevant design files under design/ before editing code.
  2. Implement code and update impacted docs in the same change set.
  3. Use English-only commit messages with a gitmoji prefix.
  4. Run make test and make guardrails before pushing changes.

For direct hook installation or reinstallation:

bash scripts/install_hooks.sh
Description
No description provided
Readme 303 KiB
Languages
TypeScript 95.1%
Python 3%
CSS 1.2%
Makefile 0.3%
Shell 0.3%