Orchestration — graphids/orchestrate/¶
Status: implemented | Last refactor: 2026-04-15 (pipeline route deleted — one route, ablation presets own their own run_dir)
A training run is a jsonnet preset rendered to a dict, validated, then
fed through build → train → evaluate. No planner, no cross-stage
driver. Multi-stage chains are built in bash by submitting each preset
with SBATCH_DEP=afterok:<jid> between them.
Layout¶
| Module | Role |
|---|---|
config.py |
ResolvedConfig, InstantiatedRun — boundary types. |
instantiate.py |
build_run(rendered) — class_path + signature-filtered kwargs, callback/logger wiring. |
stage.py |
build(resolved), train(artifacts, resolved), evaluate(artifacts, resolved). |
Execution flow¶
fit | test (cli/training.py)
|
+-- render(config_path, tla=...) [config/jsonnet.py]
+-- apply_overrides(rendered, --set ...) [cli/app.py]
+-- ResolvedConfig.from_rendered(rendered) [orchestrate/config.py]
| -> validate_config(...) [config/schemas.py]
| -> run_dir = trainer.default_root_dir
+-- build(resolved) [stage.py]
| -> gc + torch.cuda reset
| -> build_run(rendered, validated) [instantiate.py]
+-- train(artifacts, resolved, resume_from) [stage.py]
| -> wire_file_exporters(run_dir)
| -> trainer.fit(...)
| -> touch .train_complete
+-- evaluate(artifacts, resolved) [stage.py]
-> trainer.test(...)
-> touch .test_complete + save predictions
Key decisions¶
| Decision | Rationale |
|---|---|
Jsonnet preset owns run_dir |
Every configs/ablations/*.jsonnet computes run_dir from (lake_root, dataset, seed) via _paths.libsonnet. Path logic lives next to the config, not in a Python planner. |
| No in-process multi-stage driver | A bash loop over scripts/run <preset> with afterok deps does this without a parallel Python declaration. |
build / train / evaluate are dumb primitives |
No ResolvedConfig knowledge, no cache knowledge. Same primitives used by fit and test. |