longtao.wu/EmboFlow

Fork 0

eust-w f41816bbd9 🎉 feat: initialize foundation docs guardrails and workspace skeleton

2026-03-26 17:18:40 +08:00

4.1 KiB

Raw Blame History

EmboFlow System Architecture

Architecture Style

EmboFlow V1 is a browser/server platform built as:

Web frontend
Modular backend control plane
Independent worker runtime
MongoDB as the only database
Object storage abstraction over cloud object storage or MinIO
Local scheduler in V1 with future migration path to Kubernetes and Volcano

The architecture should preserve clear service boundaries even if V1 is implemented as a modular monolith plus workers.

High-Level Layers

Frontend Layer

Asset workspace
Canvas workspace
Explore workspace
Label workspace
Admin workspace

Control Plane

Identity and authorization
Workspace and project management
Asset and dataset metadata
Workflow definition management
Plugin registry and activation
Run orchestration API
Artifact indexing

Execution Plane

Workflow DAG compilation
Task queue dispatch
Worker execution
Executor routing
Log and artifact collection

Storage Layer

MongoDB for metadata and run state
Object storage for files and large outputs
Temporary local working directories for execution

Core Domain Objects

User
Workspace
Project
Asset
Dataset
DatasetVersion
WorkflowDefinition
WorkflowVersion
WorkflowRun
RunTask
Artifact
AnnotationTask
Annotation
Plugin
StorageConnection

Raw Asset And Canonical Dataset Model

The platform must distinguish between:

Raw Asset View
Canonical Dataset View

Raw assets preserve source structure, file paths, metadata layout, and original naming. Canonical datasets provide a normalized semantic layer for workflow nodes and export logic.

Visualization may read raw assets directly. Conversion, orchestration, and export should primarily target canonical semantics.

Workflow Model

Workflow definitions are versioned and contain:

Visual graph state
Logical node and edge graph
Runtime configuration
Plugin references

Workflow execution produces immutable workflow runs. A run snapshots:

Workflow version
Node configuration
Injected code
Executor settings
Input bindings

Runs compile into task DAGs.

Node And Plugin Model

Node Categories

Source
Transform
Inspect
Annotate
Export
Utility

Node Definition Contract

Each node definition includes:

Metadata
Input schema
Output schema
Config schema
UI schema
Executor type
Runtime limits
Optional code hook contract

Plugin Types

Node plugins
Reader/writer plugins
Renderer plugins
Executor plugins
Integration plugins

Execution Architecture

Executors

Python executor
Docker executor
HTTP executor

V1 should prioritize Python and Docker. HTTP executor is useful for integrating external services.

Schedulers

Local scheduler in V1
Kubernetes scheduler later
Volcano scheduler later

Executors and schedulers are separate abstractions:

Executor defines how logic runs
Scheduler defines where and under what scheduling policy it runs

Storage Architecture

MongoDB Collections

Recommended primary collections:

users
workspaces
projects
memberships
assets
asset_probe_reports
datasets
dataset_versions
workflow_definitions
workflow_definition_versions
workflow_runs
run_tasks
artifacts
annotation_tasks
annotations
plugins
storage_connections
audit_logs

Object Storage Content

Raw uploads
Imported archives
Normalized export packages
Training config packages
Preview resources
Logs and attachments
Large manifests and file indexes

Security Model

User-injected code is low-trust code and must not run in web or API processes.

V1 runtime policy:

Built-in trusted nodes may use Python executor
Plugin code should run in controlled runtimes
User-injected code should default to Docker executor
Network access should be denied by default for user code
Input and output paths should be explicitly mounted

Deployment Direction

V1 deployment target is a single public server using containerized application services. The architecture must still preserve future migration to multi-node environments.

4.1 KiB Raw Blame History