Skip to content

GraphIDS

Orchestration

frenken-lab/graphids

GraphIDS

frenken-lab/graphids

Home
Architecture
Architecture
- Module responsibilities
- Config system
- Data flow
- Orchestration Orchestration
  Table of contents
- Observability
- Write paths
- Prebatch timing
API Reference
API Reference
- Config
  Config
  - Schemas
  - Constants
  - Settings
  - Jsonnet
  - Topology
- Orchestrate
  Orchestrate
  - Config
  - Instantiate
  - Stage
- Core
  Core
  - Trainer
  - Callbacks
- Data
  Data
  - Vocab
  - Sampler
  - Budget
- SLURM
  SLURM
  - Sizing
Decisions

Orchestration — `graphids/orchestrate/`¶

Status: implemented | Last refactor: 2026-04-15 (pipeline route deleted — one route, ablation presets own their own run_dir)

A training run is a jsonnet preset rendered to a dict, validated, then fed through build → train → evaluate. No planner, no cross-stage driver. Multi-stage chains are built in bash by submitting each preset with SBATCH_DEP=afterok:<jid> between them.

Layout¶

Module	Role
`config.py`	`ResolvedConfig`, `InstantiatedRun` — boundary types.
`instantiate.py`	`build_run(rendered)` — class_path + signature-filtered kwargs, callback/logger wiring.
`stage.py`	`build(resolved)`, `train(artifacts, resolved)`, `evaluate(artifacts, resolved)`.

Execution flow¶

fit | test  (cli/training.py)
|
+-- render(config_path, tla=...)              [config/jsonnet.py]
+-- apply_overrides(rendered, --set ...)      [cli/app.py]
+-- ResolvedConfig.from_rendered(rendered)    [orchestrate/config.py]
|     -> validate_config(...)                 [config/schemas.py]
|     -> run_dir = trainer.default_root_dir
+-- build(resolved)                           [stage.py]
|     -> gc + torch.cuda reset
|     -> build_run(rendered, validated)       [instantiate.py]
+-- train(artifacts, resolved, resume_from)   [stage.py]
|     -> wire_file_exporters(run_dir)
|     -> trainer.fit(...)
|     -> touch .train_complete
+-- evaluate(artifacts, resolved)             [stage.py]
      -> trainer.test(...)
      -> touch .test_complete + save predictions

Key decisions¶

Decision	Rationale
Jsonnet preset owns `run_dir`	Every `configs/ablations/*.jsonnet` computes `run_dir` from `(lake_root, dataset, seed)` via `_paths.libsonnet`. Path logic lives next to the config, not in a Python planner.
No in-process multi-stage driver	A bash loop over `scripts/run <preset>` with `afterok` deps does this without a parallel Python declaration.
`build` / `train` / `evaluate` are dumb primitives	No `ResolvedConfig` knowledge, no cache knowledge. Same primitives used by `fit` and `test`.