# EmboFlow System Architecture ## Architecture Style EmboFlow V1 is a browser/server platform built as: - Web frontend - Modular backend control plane - Independent worker runtime - MongoDB as the only database - Object storage abstraction over cloud object storage or MinIO - Local scheduler in V1 with future migration path to Kubernetes and Volcano The architecture should preserve clear service boundaries even if V1 is implemented as a modular monolith plus workers. ## High-Level Layers ### Frontend Layer - Asset workspace - Canvas workspace - Explore workspace - Label workspace - Admin workspace ### Control Plane - Identity and authorization - Workspace and project management - Asset and dataset metadata - Workflow definition management - Plugin registry and activation - Run orchestration API - Artifact indexing ### Execution Plane - Workflow DAG compilation - Task queue dispatch - Worker execution - Executor routing - Log and artifact collection ### Storage Layer - MongoDB for metadata and run state - Object storage for files and large outputs - Temporary local working directories for execution ## Core Domain Objects - User - Workspace - Project - Asset - Dataset - DatasetVersion - WorkflowDefinition - WorkflowVersion - WorkflowRun - RunTask - Artifact - AnnotationTask - Annotation - Plugin - StorageConnection ## Raw Asset And Canonical Dataset Model The platform must distinguish between: - Raw Asset View - Canonical Dataset View Raw assets preserve source structure, file paths, metadata layout, and original naming. Canonical datasets provide a normalized semantic layer for workflow nodes and export logic. Visualization may read raw assets directly. Conversion, orchestration, and export should primarily target canonical semantics. ## Workflow Model Workflow definitions are versioned and contain: - Visual graph state - Logical node and edge graph - Runtime configuration - Plugin references Workflow execution produces immutable workflow runs. A run snapshots: - Workflow version - Node configuration - Injected code - Executor settings - Input bindings Runs compile into task DAGs. ## Node And Plugin Model ### Node Categories - Source - Transform - Inspect - Annotate - Export - Utility ### Node Definition Contract Each node definition includes: - Metadata - Input schema - Output schema - Config schema - UI schema - Executor type - Runtime limits - Optional code hook contract ### Plugin Types - Node plugins - Reader/writer plugins - Renderer plugins - Executor plugins - Integration plugins ## Execution Architecture ### Executors - Python executor - Docker executor - HTTP executor V1 should prioritize Python and Docker. HTTP executor is useful for integrating external services. ### Schedulers - Local scheduler in V1 - Kubernetes scheduler later - Volcano scheduler later Executors and schedulers are separate abstractions: - Executor defines how logic runs - Scheduler defines where and under what scheduling policy it runs ## Storage Architecture ### MongoDB Collections Recommended primary collections: - users - workspaces - projects - memberships - assets - asset_probe_reports - datasets - dataset_versions - workflow_definitions - workflow_definition_versions - workflow_runs - run_tasks - artifacts - annotation_tasks - annotations - plugins - storage_connections - audit_logs ### Object Storage Content - Raw uploads - Imported archives - Normalized export packages - Training config packages - Preview resources - Logs and attachments - Large manifests and file indexes ## Security Model User-injected code is low-trust code and must not run in web or API processes. V1 runtime policy: - Built-in trusted nodes may use Python executor - Plugin code should run in controlled runtimes - User-injected code should default to Docker executor - Network access should be denied by default for user code - Input and output paths should be explicitly mounted ## Deployment Direction V1 deployment target is a single public server using containerized application services. The architecture must still preserve future migration to multi-node environments.