Artifacts¶

Per-checkpoint artifact generation: embeddings, GAT attention weights, teacher↔student CKA, loss-landscape grids, fusion-policy traces. Driven by AnalysisConfig and dispatched directly through graphids.exp.runtime.run_stage to Analyzer(spec).run().

Distinct from graphids.analysis, which owns cross-run statistical comparison from the MLflow catalog (no torch, login-safe).

Layout¶

graphids/core/artifacts/
├── analyzer.py      orchestrates load → compute → save loop, writes manifest
├── _dispatch.py     ARTIFACTS table — each row is the only place compute + I/O meet
├── compute.py       pure compute fns + frozen result dataclasses (no fs)
├── io.py            every read (val data, teacher ckpt, fusion eval) + every write
└── __init__.py

The compute / I/O split is structural: every compute_* function takes pre-loaded models and pre-built val_data, returns a frozen dataclass, and never touches the filesystem. io.save_* consumes those dataclasses and writes; io.load_* reads. The dispatch table is the single seam between the two — adding an artifact means one new row in ARTIFACTS, one new compute fn, and one new save fn.

Adding an artifact¶

Add compute_X(...) -> XResult (frozen dataclass) to compute.py.
Add save_X(out, result) to io.py.
Add an Artifact("X", "x.npz", frozenset({...}), _run_X) row to _dispatch.ARTIFACTS and a _run_X glue fn that wires load → compute → save.
Add a toggle field on AnalysisConfig (default off, or default-on for the model types in applies_to via default_toggles_for).

expected_outputs(spec) and Analyzer.run() derive from ARTIFACTS automatically — no parallel declaration to update.

Reuse with training/eval¶

io.load_val_data goes through CANBusSource → state.get_or_build — the same path GraphDataModule.setup takes during training. val_fraction, scaler strategy, and cache digest live on the source dataclass; the analyzer picks up changes there automatically with no parallel declaration.

io.load_teacher and the student ckpt load in Analyzer.run both go through safe_load_checkpoint — the canonical "ckpt → module" registry. io.load_fusion_eval wraps the same FusionDataModule training/eval uses.

Manifest sidecar¶

Analyzer.run() writes analysis_manifest.json next to the artifacts: the rendered analysis identity, expected outputs (derived from expected_outputs(spec)), and which actually exist on disk after the run. Useful as provenance when an analyze run was submitted via SLURM and the output dir is the only artifact left.

`graphids.core.artifacts`¶

artifacts ¶

Per-checkpoint artifact generation.

ARTIFACTS `module-attribute` ¶

ARTIFACTS: tuple[Artifact, ...] = (Artifact('embeddings', 'embeddings.npz', frozenset({'vgae', 'dgi', 'gat'}), _run_embeddings), Artifact('attention', 'attention_weights.npz', frozenset({'gat'}), _run_attention), Artifact('cka', 'cka.json', frozenset({'gat'}), _run_cka), Artifact('landscape', 'loss_landscape_{model_type}.parquet', frozenset({'vgae', 'dgi', 'gat'}), _run_landscape), Artifact('fusion_policy', 'dqn_policy.json', frozenset({'fusion'}), _run_fusion_policy))

MANIFEST_NAME `module-attribute` ¶

MANIFEST_NAME = 'analysis_manifest.json'

Analyzer ¶

Analyzer(spec: AnalysisConfig)

Generate analysis artifacts from a trained checkpoint.

Source code in graphids/core/artifacts/analyzer.py

def __init__(self, spec: AnalysisConfig):
    self.spec = spec
    if not Path(spec.ckpt_path).exists():
        raise FileNotFoundError(f"Checkpoint not found: {spec.ckpt_path}")
    if spec.cka and not Path(spec.cka_teacher_ckpt).exists():
        raise FileNotFoundError(f"Teacher checkpoint not found: {spec.cka_teacher_ckpt}")

expected_outputs ¶

expected_outputs(spec: AnalysisConfig) -> tuple[str, ...]

Source code in graphids/core/artifacts/analyzer.py

def expected_outputs(spec: AnalysisConfig) -> tuple[str, ...]:
    out: list[str] = []
    for a in ARTIFACTS:
        if getattr(spec, a.name):
            out.append(a.output.format(model_type=spec.model_type))
    return tuple(out)

analyzer ¶

Per-checkpoint artifact generation.

Analyzer ¶

Analyzer(spec: AnalysisConfig)

Generate analysis artifacts from a trained checkpoint.

Source code in graphids/core/artifacts/analyzer.py

def __init__(self, spec: AnalysisConfig):
    self.spec = spec
    if not Path(spec.ckpt_path).exists():
        raise FileNotFoundError(f"Checkpoint not found: {spec.ckpt_path}")
    if spec.cka and not Path(spec.cka_teacher_ckpt).exists():
        raise FileNotFoundError(f"Teacher checkpoint not found: {spec.cka_teacher_ckpt}")

compute ¶

Pure compute primitives — no filesystem, no MLflow, no logging side-effects.

Each compute_* returns a frozen dataclass (or plain dict, for CKA's single layer→score mapping) that io.save_* knows how to serialize. The analyzer wraps the whole batch in :func:eval_mode, so no compute function re-enters it.