EmboFlow/README.md

118 lines
8.2 KiB
Markdown

# EmboFlow
EmboFlow is a B/S embodied-data workflow platform for raw asset ingestion, delivery normalization, dataset transformation, workflow execution, preview, and export.
## Current V1 Features
- Project-scoped workspace shell with a dedicated Projects page and active project selector in the header
- Asset workspace that supports local asset registration, probe summaries, storage connection management, and dataset creation
- Project-scoped custom node registry with Docker image and Dockerfile based node definitions
- Workflow templates as first-class objects, including default project templates and creating project workflows from a template
- Blank workflow creation and a large React Flow editor with drag-and-drop nodes, free canvas movement, edge validation, Docker-first node runtime presets, and Python code-hook injection
- Workflow-level `Save As Template` so edited graphs can be promoted into reusable project templates
- Mongo-backed run orchestration, worker execution, run history, task detail, logs, stdout/stderr, artifacts, cancel, retry, and task retry
- Runtime shell level Chinese and English switching
## Bootstrap
From the repository root:
```bash
make bootstrap
```
This installs workspace dependencies and runs `scripts/install_hooks.sh` so local commit and push guardrails are active.
## Local Commands
Run the full repository test suite:
```bash
make test
```
Run the strict repository guardrails:
```bash
make guardrails
```
Start package-level development entrypoints:
```bash
make dev-api
make dev-web
make dev-worker
```
## Local Deployment
Start MongoDB and MinIO:
```bash
make infra-up
```
Start the API and web app in separate terminals:
```bash
make serve-api
make serve-web
make serve-worker
```
The default local stack uses:
- API: `http://127.0.0.1:3001`
- Web: `http://127.0.0.1:3000`
- Worker: Mongo polling loop with `WORKER_POLL_INTERVAL_MS=1000`
### Local Data Validation
The local validation path currently used for embodied data testing is:
```text
/Users/longtaowu/workspace/emboldata/data
```
You can register that directory from the Assets page or via `POST /api/assets/register`.
The workflow editor currently requires selecting at least one registered asset before a run can be created.
The editor now also persists per-node runtime config in workflow versions, including executor overrides, optional artifact title overrides, and Python code-hook source for inspect and transform style nodes.
The runtime web shell now exposes a visible `中文 / English` language toggle. The core workspace shell and workflow authoring surface are translated through a lightweight i18n layer.
The shell now also exposes a dedicated Projects page plus an active project selector, so assets, datasets, workflow templates, workflows, and runs all switch together at the project boundary.
The Assets workspace now includes first-class storage connections and datasets. A dataset is distinct from a raw asset and binds project source assets to a selected local or object-storage-backed destination.
The shell now also exposes a dedicated Nodes page for project-scoped custom container nodes. Custom nodes can be registered from an existing Docker image or a self-contained Dockerfile, and each node declares whether it consumes a single asset set or multiple upstream asset sets plus what kind of output it produces.
The Workflows workspace now includes a template gallery. Projects can start from default or saved templates, or create a blank workflow directly.
The workflow editor center panel now uses a real draggable node canvas with zoom, pan, mini-map, dotted background, handle-based edge creation, persisted node positions, and localized validation feedback instead of a static list of node cards.
The workflow editor right panel now also supports saving the current workflow draft as a reusable workflow template, in addition to editing per-node runtime settings and Python hooks.
When a custom node is selected on the canvas, the right panel now also exposes its declared input contract, output contract, artifact type, and container source so the operator can confirm compatibility without leaving the editor.
The workflow editor now also exposes a workflow-level preflight panel. Saved workflow versions can be checked against the selected bound asset before execution, and run creation is blocked when the current version still has graph, executor, or asset-binding errors.
The node library now supports both click-to-append and drag-and-drop placement into the canvas. When a node is inserted from the library, the editor now seeds its default runtime contract directly into the workflow draft, so custom Docker nodes keep their declared executor type and I/O contract without extra manual edits. V1 connection rules block self-edges, duplicate edges, cycles, incoming edges into source nodes, outgoing edges from export nodes, and multiple upstream edges into ordinary nodes, while allowing multi-input set nodes such as `union-assets`, `intersect-assets`, and `difference-assets` plus any custom node whose runtime contract declares `inputMode=multi_asset_set`.
The Runs workspace now shows project-scoped run history, run-level aggregated summaries, cancel/retry controls, and run detail views with persisted task summaries, stdout/stderr sections, result previews, and artifact links into Explore.
Selected run tasks now expose the frozen node definition id, executor config snapshot, and code-hook metadata that were captured when the run was created.
Most built-in delivery nodes now default to `executorType=docker`. When a node uses `executorType=docker` and provides `executorConfig.image`, the worker runs a real local Docker container with mounted `input.json` / `output.json` exchange files plus read-only mounts for bound asset paths. If no image is configured, the executor falls back to the lightweight simulated behavior used by older demo tasks.
Custom Docker nodes follow the same runtime contract. The container reads the task snapshot and execution context from `EMBOFLOW_INPUT_PATH`, writes `{\"result\": ...}` JSON to `EMBOFLOW_OUTPUT_PATH`, and if it declares an asset-set output contract it must return `result.assetIds` as a string array. Dockerfile-based custom nodes are built locally on first execution and then reused by tag. The Nodes page and API now share the same validation rules, including required names, valid source kinds, a mandatory `FROM` instruction for Dockerfiles, and rejection of `Source` category nodes that incorrectly declare `inputMode=multi_asset_set`. The editor also renders the standard EmboFlow input and output envelope preview for custom nodes so users can align container code to the actual runtime JSON shape.
When a node uses the built-in Python path without a custom hook, `source-asset` now emits bound asset metadata from Mongo-backed asset records and `validate-structure` now performs a real directory validation pass against local source paths. On the current sample path `/Users/longtaowu/workspace/emboldata/data`, that validation reports `valid=false`, `videoFileCount=407`, and missing delivery files because the sample root is a mixed dataset collection rather than a delivery package.
The worker now also carries direct upstream task results into execution context so set-operation utility nodes can compute narrowed asset sets and pass those effective asset ids to downstream tasks.
## Repository Structure
- `apps/api` contains the control-plane modules for workspaces, assets, workflows, runs, and artifacts.
- `apps/web` contains the React shell, asset workspace, workflow editor surface, run detail view, and explore renderers.
- `apps/worker` contains the Mongo-backed worker runtime, task runner, and executor contracts.
- `design/` contains the architecture and product design documents that must stay aligned with implementation.
- `docs/` contains workflow guidance and the executable implementation plan.
## Developer Workflow
1. Read the relevant design files under `design/` before editing code.
2. Implement code and update impacted docs in the same change set.
3. Use English-only commit messages with a gitmoji prefix.
4. Run `make test` and `make guardrails` before pushing changes.
For direct hook installation or reinstallation:
```bash
bash scripts/install_hooks.sh
```