Skip to content

Module Responsibilities

Experiment configs (configs/experiments/*.yml) define one launchable unit: dataset, stage, representation, resources, and the stage-specific payload under config.

Typed config (graphids/exp/config.py) validates YAML and builds a RunConfig. It owns the contract for ExperimentConfig, ResourceConfig, stage payloads, output paths, MLflow tags/params, and the representation drift check between top-level metadata and data.source.

Primitives (graphids/primitives*.py) are the public object factory surface for YAML specs. Data, model, loss, scaler, representation, ID encoding, and discovery primitives live here or are re-exported here.

Experiment runtime (graphids/exp/runtime.py) owns launch lifecycle and stage dispatch. It writes the manifest/events journal, creates the MLflow logger, builds config-driven objects, runs Lightning for fit/test, materializes caches for cache, and calls extraction/analyzer code for extract/analyze.

SLURM submit (graphids/exp/slurm.py, graphids/cli/exp.py) validates an experiment YAML, renders an sbatch script, and submits it. The compute node runs python -m graphids exp launch <yaml> after sourcing scripts/slurm/_preamble.sh.

Data sources and datamodules (graphids/core/data/) own raw CAN loading, representation selection, cache paths, materialization, metadata, and Lightning dataloaders. GraphDataModule(require_cache=True) fails fast when a training config expects a cache that is missing or incomplete.

Preprocessing (graphids/core/data/preprocessing/) turns raw rows into materialized graph views. Snapshot, snapshot-sequence, multi-scale, temporal, and entity representations are explicit. Snapshot-sequence materialization stores sequence metadata on graph/node/edge tensors.

Models (graphids/core/models/) own Lightning modules and metrics. The GAT now supports sequence-aware graph pooling through sequence_pool.

Callbacks (graphids/core/callbacks.py) hold graphids-specific Lightning policy such as Sha256ModelCheckpoint, tau-norm, and VRAM drift warnings.

MLflow (graphids/_mlflow.py) resolves the shared tracking URI, builds the Lightning MLflow logger, and starts/stops MLflow system-metrics monitoring for Lightning-created runs.

The live flow is:

configs/experiments/<run>.yml
    -> ExperimentConfig.from_yaml
    -> ExperimentConfig.build_run
    -> gx exp launch OR gx exp submit
    -> graphids.exp.runtime.launch_run
    -> run_stage: cache | fit | test | extract | analyze
    -> MLflow + .graphids/manifest.json + .graphids/events.jsonl

The old graphids/plan row renderer, gx run, gx plans submit, and graphids/orchestrate.py row dispatcher are historical and should not be used for new work.