Core: Models¶
Model families used as ablation rows. All inherit from
GraphModuleBase (base.py), which owns the VRAM probe
(compute_budget) plus the _store_init_kwargs /
_build_id_encoder mixins.
autoencoder/— VGAE family (unsupervised reconstruction). Stage 1 of the KD chain.supervised/— GAT family (supervised classification). Stage 2.fusion/— fusion modules dispatching onfusion_methodTLA over the method libsonnets. Stage 3.id_encoding/— categorical-ID encoders (embedding tables with reserved UNK at index 0).
graphids.core.models¶
models ¶
Core model families and shared model base classes.
BanditFusionModule ¶
BanditFusionModule(state_dim: int = 18, alpha_steps: int = 21, ucb_alpha: float = 1.0, lambda_reg: float = 1.0, hidden_dim: int = 128, num_layers: int = 3, backbone_lr: float = 0.001, backbone_retrain_freq: int = 50, backbone_epochs: int = 5, buffer_size: int = 100000, batch_size: int = 128, decision_threshold: float = 0.5, reward_kwargs: dict | None = None)
Bases: RLFusionBase
Neural-LinUCB: backbone + per-arm ridge with Sherman-Morrison online updates, and a frequency-gated backbone refit.
Source code in graphids/core/models/fusion/bandit.py
GraphModuleBase ¶
Bases: _ModelBase
Shared base for VGAE, GAT, DGI — lazy setup, threshold metrics.
Subclasses must implement _build() using self.hparams.
compute_budget ¶
Probe-once VRAM budget cached on the model.
Source code in graphids/core/models/base.py
configure_optimizers ¶
Adam over all params using self.hparams.lr / weight_decay.
Source code in graphids/core/models/base.py
on_validation_epoch_end ¶
prepare_from_datamodule ¶
Lazy-build with DM-supplied sizes, then capture test-set names.
Source code in graphids/core/models/base.py
autoencoder ¶
Autoencoder/self-supervised model family exports.
DGI ¶
DGI(conv_type: str = 'gatv2', hidden_dims: list[int] | None = None, latent_dim: int | None = None, heads: int | None = None, embedding_dim: int | None = None, dropout: float = 0.15, edge_dim: int = 11, proj_dim: int = 0, gradient_checkpointing: bool = True, compile_model: bool = False, batch_norm: bool = True, id_encoder_cfg: IdEncodingCfg | None = None, id_encoder_class_path: str = 'graphids.core.models.id_encoding.LookupIdEncoder', id_encoder_kwargs: dict | None = None, lr: float = 0.001, weight_decay: float = 0.0001, scale: str = 'small', model_type: ModelType = 'dgi', dataset: str = '', seed: int = 42, num_ids: int = 0, in_channels: int = 0, num_classes: int = 2)
Bases: ScoreBasedDetectorMixin
Collapsed DGI — arch + trainer-bridge in one nn.Module.
No loss_fn kwarg: the contrastive MI loss is intrinsic to the
architecture (built into the discriminator).
Source code in graphids/core/models/autoencoder/dgi.py
dgi_loss ¶
Contrastive MI loss: maximize real node–summary agreement.
Source code in graphids/core/models/autoencoder/dgi.py
discriminate ¶
encode ¶
Encode nodes to latent embeddings (same contract as VGAE minus KL).
Source code in graphids/core/models/autoencoder/dgi.py
extract_features ¶
Per-graph fusion features as named tensors (symmetric to VGAE/GAT).
pos_stats[N, 3] — anomaly, pos_mean, pos_spread (discriminator-derived)conf[N, 1] — 1 / (1 + anomaly)z_stats[N, 4] — z_mean, z_std, z_max, z_min (latent-pooled)
Source code in graphids/core/models/autoencoder/dgi.py
on_test_setup ¶
Fit SVDD center from training-normal graphs at test-start. Always re-fits (no idempotence flag — center isn't persisted in state_dict; see Cardinal jid 8772115 for ckpt-ordering rationale).
Source code in graphids/core/models/autoencoder/dgi.py
score ¶
OCGIN score: L2 distance from SVDD centroid in pooled-latent space.
Source code in graphids/core/models/autoencoder/dgi.py
VGAE ¶
VGAE(*, loss_fn: Module | None = None, conv_type: str = 'gatv2', hidden_dims: list[int] | None = None, latent_dim: int | None = None, heads: int = 4, embedding_dim: int = 32, dropout: float = 0.1, edge_dim: int = 11, proj_dim: int = 0, gradient_checkpointing: bool = True, compile_model: bool = False, batch_norm: bool = True, mlp_hidden: int | None = None, id_encoder_cfg: IdEncodingCfg | None = None, id_encoder_class_path: str = 'graphids.core.models.id_encoding.LookupIdEncoder', id_encoder_kwargs: dict | None = None, lr: float = 0.003, weight_decay: float = 0.0001, mask_rate: float = 0.15, score_recon_weight: float = 1.0, score_mahal_weight: float = 1.0, score_kl_weight: float = 1.0, scale: str = 'small', model_type: ModelType = 'vgae', dataset: str = '', seed: int = 42, num_ids: int = 0, in_channels: int = 0, num_classes: int = 2)
Bases: ScoreBasedDetectorMixin
Collapsed VGAE — arch + trainer-bridge in one nn.Module.
Loss selection is decoupled: loss_fn is an nn.Module built
from experiment config.
Anomaly score = max-σ over four components (masked recon mean,
masked recon max, TAM affinity, Rayleigh quotient). Calibration
buffers are filled by :meth:on_test_setup at test-start.
Source code in graphids/core/models/autoencoder/vgae.py
encode ¶
Returns (z, kl_per_node, mu).
Source code in graphids/core/models/autoencoder/vgae.py
extract_features ¶
Per-graph fusion features as named tensors.
errors[N, 3] — recon, mahal, kl (anomaly evidence)conf[N, 1] — 1 / (1 + recon)z_stats[N, 4] — z_mean, z_std, z_max, z_minspike[N, 1] — recon_max (per-graph max masked-node MSE)affinity[N, 1] — TAM per-graph mean affinityrq[N, 1] — Rayleigh quotient (input-space spectral smoothness)
Source code in graphids/core/models/autoencoder/vgae.py
on_test_setup ¶
Fit z-norm calibration buffers from benign val if not already populated. Idempotent: skips if a calibrated ckpt was reloaded.
Source code in graphids/core/models/autoencoder/vgae.py
score ¶
Per-graph anomaly score: max-σ over (recon, recon_max, TAM affinity, RQ) in the calibrated z-norm space.
Source code in graphids/core/models/autoencoder/vgae.py
dgi ¶
Deep Graph Infomax — collapsed arch + trainer-bridge.
Maximizes mutual information between node embeddings and a graph-level summary via a bilinear discriminator. Uses the same encoder backbone as VGAE (InputEncoder + conv stack) for fair ablation comparison.
Anomaly scoring at test time: OCGIN-style L2 distance between the pooled node embedding of a query graph and the centroid of training-normal pooled embeddings (Zhao & Akoglu 2021, arxiv:2103.04494).
Reference: Veličković et al., "Deep Graph Infomax" (ICLR 2019).
DGI ¶
DGI(conv_type: str = 'gatv2', hidden_dims: list[int] | None = None, latent_dim: int | None = None, heads: int | None = None, embedding_dim: int | None = None, dropout: float = 0.15, edge_dim: int = 11, proj_dim: int = 0, gradient_checkpointing: bool = True, compile_model: bool = False, batch_norm: bool = True, id_encoder_cfg: IdEncodingCfg | None = None, id_encoder_class_path: str = 'graphids.core.models.id_encoding.LookupIdEncoder', id_encoder_kwargs: dict | None = None, lr: float = 0.001, weight_decay: float = 0.0001, scale: str = 'small', model_type: ModelType = 'dgi', dataset: str = '', seed: int = 42, num_ids: int = 0, in_channels: int = 0, num_classes: int = 2)
Bases: ScoreBasedDetectorMixin
Collapsed DGI — arch + trainer-bridge in one nn.Module.
No loss_fn kwarg: the contrastive MI loss is intrinsic to the
architecture (built into the discriminator).
Source code in graphids/core/models/autoencoder/dgi.py
dgi_loss ¶
Contrastive MI loss: maximize real node–summary agreement.
Source code in graphids/core/models/autoencoder/dgi.py
discriminate ¶
encode ¶
Encode nodes to latent embeddings (same contract as VGAE minus KL).
Source code in graphids/core/models/autoencoder/dgi.py
extract_features ¶
Per-graph fusion features as named tensors (symmetric to VGAE/GAT).
pos_stats[N, 3] — anomaly, pos_mean, pos_spread (discriminator-derived)conf[N, 1] — 1 / (1 + anomaly)z_stats[N, 4] — z_mean, z_std, z_max, z_min (latent-pooled)
Source code in graphids/core/models/autoencoder/dgi.py
on_test_setup ¶
Fit SVDD center from training-normal graphs at test-start. Always re-fits (no idempotence flag — center isn't persisted in state_dict; see Cardinal jid 8772115 for ckpt-ordering rationale).
Source code in graphids/core/models/autoencoder/dgi.py
score ¶
OCGIN score: L2 distance from SVDD centroid in pooled-latent space.
Source code in graphids/core/models/autoencoder/dgi.py
vgae ¶
Variational graph autoencoder — collapsed arch + trainer-bridge.
The single :class:VGAE class is both the architecture (encoder /
decoder / aux heads / mask token / score-norm calibration buffers) and
the trainer-bridge (training_step/validation_step/test_step,
score primitives, fusion-feature extractor). No wrapper module — see
~/plans/graphids-collapse-model-modules.md Phase 1.
Encoder maps node features to q(z|x) = N(mu, σ²); decoder
reconstructs continuous features from the reparameterized z.
Mask-recon training (15% random node masking) commits the encoder to
"predict v from neighborhood" rather than "echo v back".
VGAE ¶
VGAE(*, loss_fn: Module | None = None, conv_type: str = 'gatv2', hidden_dims: list[int] | None = None, latent_dim: int | None = None, heads: int = 4, embedding_dim: int = 32, dropout: float = 0.1, edge_dim: int = 11, proj_dim: int = 0, gradient_checkpointing: bool = True, compile_model: bool = False, batch_norm: bool = True, mlp_hidden: int | None = None, id_encoder_cfg: IdEncodingCfg | None = None, id_encoder_class_path: str = 'graphids.core.models.id_encoding.LookupIdEncoder', id_encoder_kwargs: dict | None = None, lr: float = 0.003, weight_decay: float = 0.0001, mask_rate: float = 0.15, score_recon_weight: float = 1.0, score_mahal_weight: float = 1.0, score_kl_weight: float = 1.0, scale: str = 'small', model_type: ModelType = 'vgae', dataset: str = '', seed: int = 42, num_ids: int = 0, in_channels: int = 0, num_classes: int = 2)
Bases: ScoreBasedDetectorMixin
Collapsed VGAE — arch + trainer-bridge in one nn.Module.
Loss selection is decoupled: loss_fn is an nn.Module built
from experiment config.
Anomaly score = max-σ over four components (masked recon mean,
masked recon max, TAM affinity, Rayleigh quotient). Calibration
buffers are filled by :meth:on_test_setup at test-start.
Source code in graphids/core/models/autoencoder/vgae.py
encode ¶
Returns (z, kl_per_node, mu).
Source code in graphids/core/models/autoencoder/vgae.py
extract_features ¶
Per-graph fusion features as named tensors.
errors[N, 3] — recon, mahal, kl (anomaly evidence)conf[N, 1] — 1 / (1 + recon)z_stats[N, 4] — z_mean, z_std, z_max, z_minspike[N, 1] — recon_max (per-graph max masked-node MSE)affinity[N, 1] — TAM per-graph mean affinityrq[N, 1] — Rayleigh quotient (input-space spectral smoothness)
Source code in graphids/core/models/autoencoder/vgae.py
on_test_setup ¶
Fit z-norm calibration buffers from benign val if not already populated. Idempotent: skips if a calibrated ckpt was reloaded.
Source code in graphids/core/models/autoencoder/vgae.py
score ¶
Per-graph anomaly score: max-σ over (recon, recon_max, TAM affinity, RQ) in the calibrated z-norm space.
Source code in graphids/core/models/autoencoder/vgae.py
base ¶
Shared model infrastructure — base classes, utilities, contracts.
Graph family:
- GraphModuleBase — base for VGAE, GAT, DGI
- try_compile — safe torch.compile with conv-type gating
- eval_mode — context manager that restores training state
Shared:
- _ModelBase(pl.LightningModule) — mixin shared by GraphModuleBase +
FusionModuleBase. Lightning provides self.device, self.log,
self.log_dict, self.hparams, self.trainer, etc.
- safe_load_checkpoint — checkpoint loading via class_path registry
- strip_orig_mod_prefix — drop _orig_mod. keys from state_dicts
produced under torch.compile
GraphModuleBase ¶
Bases: _ModelBase
Shared base for VGAE, GAT, DGI — lazy setup, threshold metrics.
Subclasses must implement _build() using self.hparams.
compute_budget ¶
Probe-once VRAM budget cached on the model.
Source code in graphids/core/models/base.py
configure_optimizers ¶
Adam over all params using self.hparams.lr / weight_decay.
Source code in graphids/core/models/base.py
on_validation_epoch_end ¶
prepare_from_datamodule ¶
Lazy-build with DM-supplied sizes, then capture test-set names.
Source code in graphids/core/models/base.py
ScoreBasedDetectorMixin ¶
Bases: GraphModuleBase
Mix-in for graph models that emit per-graph anomaly scores.
Source code in graphids/core/models/base.py
eval_mode ¶
Context manager: set model.eval(), restore original training state on exit.
safe_load_checkpoint ¶
Load a checkpoint, dispatching on the class_path saved at write time.
model_type is used only to know which loss_fn to rebuild for VGAE/GAT
(loss is excluded from hyperparameters). Class lookup uses the
self-describing class_path injected by _ModelBase.on_save_checkpoint.
Source code in graphids/core/models/base.py
strip_orig_mod_prefix ¶
Drop _orig_mod. prefix injected by torch.compile's OptimizedModule.
_orig_mod. can appear mid-key (e.g. model._orig_mod.encoder.weight)
when compile wraps an inner submodule; replace handles every position.
Source code in graphids/core/models/base.py
try_compile ¶
Attempt torch.compile; fall back to eager on inductor failure.
Skips compile entirely for conv types that use to_dense_batch()
(e.g. GPS) — the Tensor.item() call causes graph breaks, repeated
recompilation, and eventual CUDA illegal memory access.
Source code in graphids/core/models/base.py
fusion ¶
Fusion policy model family exports.
BanditFusionModule ¶
BanditFusionModule(state_dim: int = 18, alpha_steps: int = 21, ucb_alpha: float = 1.0, lambda_reg: float = 1.0, hidden_dim: int = 128, num_layers: int = 3, backbone_lr: float = 0.001, backbone_retrain_freq: int = 50, backbone_epochs: int = 5, buffer_size: int = 100000, batch_size: int = 128, decision_threshold: float = 0.5, reward_kwargs: dict | None = None)
Bases: RLFusionBase
Neural-LinUCB: backbone + per-arm ridge with Sherman-Morrison online updates, and a frequency-gated backbone refit.
Source code in graphids/core/models/fusion/bandit.py
MLPFusionModule ¶
MLPFusionModule(state_dim: int = 18, hidden_dims: tuple[int, ...] = (64, 32), lr: float = 0.001, decision_threshold: float = 0.5)
Bases: FusionModuleBase
Same features as DQN, trained with BCE instead of RL.
Source code in graphids/core/models/fusion/mlp.py
MoEFusionModule ¶
MoEFusionModule(state_dim: int = 18, num_experts: int = 3, expert_hidden: tuple[int, ...] = (64, 32), gate_hidden: tuple[int, ...] = (32,), lr: float = 0.001, decision_threshold: float = 0.5, aux_weight: float = 0.01)
Bases: FusionModuleBase
Dense soft-gated mixture of K identical experts over the flat feature vector.
Specialization is emergent: experts share architecture and input;
only the gate's softmax over per-sample logits selects how their
outputs combine. If gate entropy stays at log(K) (uniform) on
a fitted run, the features carry no routable signal — see
diagnostics + escalation table in the design doc.
Source code in graphids/core/models/fusion/moe.py
RLFusionBase ¶
Bases: FusionModuleBase
torchrl replay buffer + unified act/learn flow.
Subclass implements:
- _compute_loss(sample) -> Tensor — scalar loss from a buffer
sample. DQN delegates to a torchrl DQNLoss; Bandit computes
MSE inline. The optimizer scope (self._optimizer) is whatever
params the subclass actually trains — it does NOT have to match
a single loss_module.
Subclass sets in __init__:
- self._optimizer — optimizer over the trainable params.
Hooks:
- _score_actions(td, training) — write td['action'].
- _after_act(actions, obs, rewards) — online update.
- _should_learn() — gate the optim step (default: every step).
- _after_optim_step() — post-step (DQN target sync).
- _after_learn() — post-batch (Bandit ridge reset).
- _extra_metrics() — extra log fields.
Source code in graphids/core/models/fusion/base.py
select_action_batch ¶
Returns (actions[N], alphas[N], normalized_features_td[N]).
Source code in graphids/core/models/fusion/base.py
WeightedAvgModule ¶
Bases: FusionModuleBase
alpha = sigmoid(w); score = (1-alpha)·vgae_conf + alpha·gat_conf.
Source code in graphids/core/models/fusion/weighted_avg.py
flatten_features ¶
Concatenate every leaf tensor along the last dim. Stable order: sorted nested-key path so the same TD always yields the same layout.
Only tuple-keyed (model-namespaced) leaves are concatenated. Top-level
str leaves are reserved for metadata (labels, attack_type);
they pass through the TD untouched and reach test_step via
td.get(...) instead of being treated as features.
Source code in graphids/core/models/fusion/base.py
bandit ¶
Neural-LinUCB contextual bandit (Xu et al., ICLR 2022).
Backbone is gradient-trained (MSE between θ_a·z(s) and stored reward); the per-arm θ is updated analytically via Sherman-Morrison ridge. No torchrl LossModule — would be a vestigial wrapper here since θ is not gradient-trained and there's no target net.
BanditFusionModule ¶
BanditFusionModule(state_dim: int = 18, alpha_steps: int = 21, ucb_alpha: float = 1.0, lambda_reg: float = 1.0, hidden_dim: int = 128, num_layers: int = 3, backbone_lr: float = 0.001, backbone_retrain_freq: int = 50, backbone_epochs: int = 5, buffer_size: int = 100000, batch_size: int = 128, decision_threshold: float = 0.5, reward_kwargs: dict | None = None)
Bases: RLFusionBase
Neural-LinUCB: backbone + per-arm ridge with Sherman-Morrison online updates, and a frequency-gated backbone refit.
Source code in graphids/core/models/fusion/bandit.py
base ¶
Fusion model bases.
All fusion modules consume a feature TensorDict from the new
extraction pipeline, not a flat state vector. Modules that need a
flat input (Q-network for DQN/Bandit, MLP) call flatten_features(td)
to concatenate every leaf tensor along the feature dim.
-
FusionModuleBase— predict / training_step / validation_step / test_step. Branches onautomatic_optimization: supervised path (MLP, WeightedAvg) implementsforward_scores(td) -> probs; RL path comes fromRLFusionBase. -
RLFusionBase— torchrl replay buffer + act → reward → push → learn. Subclasses provide a torchrlLossModuleplus three hooks.
RLFusionBase ¶
Bases: FusionModuleBase
torchrl replay buffer + unified act/learn flow.
Subclass implements:
- _compute_loss(sample) -> Tensor — scalar loss from a buffer
sample. DQN delegates to a torchrl DQNLoss; Bandit computes
MSE inline. The optimizer scope (self._optimizer) is whatever
params the subclass actually trains — it does NOT have to match
a single loss_module.
Subclass sets in __init__:
- self._optimizer — optimizer over the trainable params.
Hooks:
- _score_actions(td, training) — write td['action'].
- _after_act(actions, obs, rewards) — online update.
- _should_learn() — gate the optim step (default: every step).
- _after_optim_step() — post-step (DQN target sync).
- _after_learn() — post-batch (Bandit ridge reset).
- _extra_metrics() — extra log fields.
Source code in graphids/core/models/fusion/base.py
select_action_batch ¶
Returns (actions[N], alphas[N], normalized_features_td[N]).
Source code in graphids/core/models/fusion/base.py
build_mlp_body ¶
[Linear → LayerNorm → ReLU → Dropout(0.2)] x N.
Source code in graphids/core/models/fusion/base.py
flatten_features ¶
Concatenate every leaf tensor along the last dim. Stable order: sorted nested-key path so the same TD always yields the same layout.
Only tuple-keyed (model-namespaced) leaves are concatenated. Top-level
str leaves are reserved for metadata (labels, attack_type);
they pass through the TD untouched and reach test_step via
td.get(...) instead of being treated as features.
Source code in graphids/core/models/fusion/base.py
dqn ¶
DQN fusion: torchrl DQNLoss + EGreedyModule over QValueActor.
Subclasses RLFusionBase and contributes only the DQN-specific math:
the Q-actor + epsilon-greedy explorer, the DQNLoss (with double_dqn
toggle and delay_value target net), and SoftUpdate Polyak sync.
gamma=0 because each graph is an independent context.
mlp ¶
Supervised MLP baseline: binary classification from flattened fusion features.
MLPFusionModule ¶
MLPFusionModule(state_dim: int = 18, hidden_dims: tuple[int, ...] = (64, 32), lr: float = 0.001, decision_threshold: float = 0.5)
Bases: FusionModuleBase
Same features as DQN, trained with BCE instead of RL.
Source code in graphids/core/models/fusion/mlp.py
moe ¶
MoE+BCE per-sample gated fusion: K experts with softmax router, dense soft-gated.
Implements the canonical Jacobs & Jordan (1991) "Adaptive Mixtures of Local
Experts" formulation: every sample passes through every expert; the gate
emits per-sample weights w(x) ∈ Δ^{K-1}; final prediction is the convex
combination Σᵢ wᵢ(x) · sigmoid(hᵢ(x)). Trained end-to-end with BCE on
the mixed score — no per-expert supervision, no auxiliary losses in v0.
Why dense soft-gated and not sparse top-k: sparse routing's value is
conditional compute at scale (Switch Transformer, Mixtral). At K=3 with
18-dim features the FLOPs argument is moot, and soft blending is the
hypothesis we want to test. Design rationale, variant survey, and
escalation paths: docs/drafts/moe-fusion-design.md.
MoEFusionModule ¶
MoEFusionModule(state_dim: int = 18, num_experts: int = 3, expert_hidden: tuple[int, ...] = (64, 32), gate_hidden: tuple[int, ...] = (32,), lr: float = 0.001, decision_threshold: float = 0.5, aux_weight: float = 0.01)
Bases: FusionModuleBase
Dense soft-gated mixture of K identical experts over the flat feature vector.
Specialization is emergent: experts share architecture and input;
only the gate's softmax over per-sample logits selects how their
outputs combine. If gate entropy stays at log(K) (uniform) on
a fitted run, the features carry no routable signal — see
diagnostics + escalation table in the design doc.
Source code in graphids/core/models/fusion/moe.py
reward ¶
Fusion reward calculator.
This module now exposes one reward primitive. The old mode switch and alternate reward class were deleted to keep the model layer honest: one calculator, one contract.
FusionRewardCalculator ¶
FusionRewardCalculator(*, vgae_weights: list[float] | tuple[float, ...], correct: float, incorrect: float, confidence_weight: float, combined_conf_weight: float, disagreement_penalty: float, overconf_penalty: float, balance_weight: float)
Bases: Module
Vectorized fusion reward over a feature TensorDict.
Required nested keys
("vgae", "errors")[N, 3] — recon, mahal, kl("vgae", "conf")[N, 1]("gat", "probs")[N, 2]("gat", "conf")[N, 1]
Other keys (z_stats, emb_stats, …) are ignored — they're consumed by the supervised/Q-network paths after flattening, not by the reward.
Source code in graphids/core/models/fusion/reward.py
compute ¶
compute(td: TensorDict, preds: Tensor, labels: Tensor, alphas: Tensor) -> tuple[torch.Tensor, dict[str, torch.Tensor]]
Vectorized reward. Returns (total[N], components).
components is a per-term breakdown of what each shaping term
contributed to total per graph (mutually-exclusive correct/wrong
terms zero out on the inactive branch). sum(components.values()) ==
total by construction. Used by callers to log per-component means
per epoch — diagnostic for which term the policy is exploiting.
Source code in graphids/core/models/fusion/reward.py
derive_scores ¶
Return (anomaly_scores[N], gat_probs_pos[N]).
anomaly was previously (errors @ weights).clamp(0, 1) —
saturated to 1.0 for nearly every sample because typical weighted
error magnitudes are O(1)–O(10), well above the clamp ceiling. That
broke the RL fusion path: at α≈0.5 the blended score
α·gat_prob + (1−α)·anomaly was ≥0.5 everywhere → predict-attack
on every sample → MCC≈0 even though AUROC was perfect (the ranking
was right but the threshold was uniformly above benigns). Replace
the clamp with the Möbius transform x/(1+x): bounded [0, 1) on
non-negative errors (recon, mahal, kl are all non-negative), strictly
monotonic, parameter-free, preserves rank ordering. This matches the
sigmoidal compression weighted_avg already uses on recon_mean.
Source code in graphids/core/models/fusion/reward.py
normalize ¶
Clamp confidence keys to [0, 1]. Returns a shallow-cloned TD.
Source code in graphids/core/models/fusion/reward.py
weighted_avg ¶
Simplest fusion baseline: learns a single scalar alpha blending vgae anomaly + gat attack prob.
score = (1-alpha) * vgae_anom + alpha * gat_attack where vgae_anom = 1 - vgae_conf = recon_mean/(1+recon_mean) (high = anomalous) gat_attack = gat/probs[:,1] (high = attack)
If this matches DQN's F1, the RL approach is unjustified.
WeightedAvgModule ¶
Bases: FusionModuleBase
alpha = sigmoid(w); score = (1-alpha)·vgae_conf + alpha·gat_conf.
Source code in graphids/core/models/fusion/weighted_avg.py
id_encoding ¶
Pluggable identity-encoding strategies for graph nodes.
An IdEncoder maps a node_id LongTensor to per-node embedding
vectors. Subclasses implement different strategies (lookup table,
k-probe hash, ...) behind a uniform interface so VGAE / GAT / DGI do
not know which strategy is in use.
Research basis: ~/plans/oov-embedding-handling.md.
HashIdEncoder ¶
Bases: IdEncoder
Source code in graphids/core/models/id_encoding/hash_embedding.py
from_vocab_size
classmethod
¶
from_vocab_size(num_ids: int, *, embedding_dim: int, k: int = 2, seed: int = 42, num_buckets_factor: int = 4, num_buckets: int | None = None) -> HashIdEncoder
Build from a datamodule-injected num_ids.
Default bucket count: next_pow2(num_buckets_factor · num_ids),
minimum 8. Per plan: Yan 2021 / Coleman 2023 use 2–4× vocab size
as a sweet spot between collision rate and parameter count.
num_buckets can be passed explicitly to override.
Source code in graphids/core/models/id_encoding/hash_embedding.py
IdEncoder ¶
Bases: Module
Maps per-node identities to per-node embedding vectors.
Planned subclasses:
- LookupIdEncoder — dense nn.Embedding over a shared vocab,
with optional stochastic UNK-drop (Stage 3 ablation).
- HashIdEncoder (Stage 2 primary, not yet implemented) — k-probe
hash embedding per Yan et al. 2021 (CIKM).
build_encoder ¶
Resolve a dotted class_path and call from_vocab_size.
num_ids is data-dependent (populated by datamodule.setup), so
encoder construction stays at model-build time.
Source code in graphids/core/models/id_encoding/base.py
base ¶
Base class for pluggable identity encoders.
Contract (duck-typed, matching the rest of the codebase):
forward(node_id: LongTensor) -> Tensorof shape(N, out_dim).out_dim: intattribute set in__init__.- All stateful policy (vocab size, hash seeds, UNK-drop rate) lives on
the encoder instance —
InputEncoderholds one and does not branch on its type.
IdEncoder ¶
Bases: Module
Maps per-node identities to per-node embedding vectors.
Planned subclasses:
- LookupIdEncoder — dense nn.Embedding over a shared vocab,
with optional stochastic UNK-drop (Stage 3 ablation).
- HashIdEncoder (Stage 2 primary, not yet implemented) — k-probe
hash embedding per Yan et al. 2021 (CIKM).
build_encoder ¶
Resolve a dotted class_path and call from_vocab_size.
num_ids is data-dependent (populated by datamodule.setup), so
encoder construction stays at model-build time.
Source code in graphids/core/models/id_encoding/base.py
config ¶
Explicit ID-encoding configs and factories.
hash_embedding ¶
k-probe hash embedding — primary Stage-2 treatment.
Every id (seen or unseen) deterministically maps to k rows of a
bucketed embedding table by k decorrelated hash functions; the
per-probe vectors are summed. Because any id hits trained buckets by
construction, no special OOV slot is needed.
Shape follows Coleman et al. 2023 Unified Embedding (NeurIPS
Spotlight): one shared table, k probes, sum combiner — minimum
parameters, clean theoretical analysis. Yan et al. 2021 Binary Code
Hash Embedding (CIKM) uses the same k-probe idea with separate tables
per hash; at CAN scale (~100 ids, B=512) the shared table has the
same expressive power at half the parameters.
Hash: bucket_i(id) = (id * KNUTH + offset_i) mod num_buckets, where
KNUTH = 2654435761 (golden-ratio-derived Knuth multiplier) and the
k offsets are deterministic functions of the seed constructor
arg. The multiplier is coprime to any num_buckets >= 2 that isn't
a specific pathological case, and Knuth's value is well-studied for
integer-id hashing at tiny scale.
Research basis: ~/plans/oov-embedding-handling.md (Stage 2).
HashIdEncoder ¶
Bases: IdEncoder
Source code in graphids/core/models/id_encoding/hash_embedding.py
from_vocab_size
classmethod
¶
from_vocab_size(num_ids: int, *, embedding_dim: int, k: int = 2, seed: int = 42, num_buckets_factor: int = 4, num_buckets: int | None = None) -> HashIdEncoder
Build from a datamodule-injected num_ids.
Default bucket count: next_pow2(num_buckets_factor · num_ids),
minimum 8. Per plan: Yan 2021 / Coleman 2023 use 2–4× vocab size
as a sweet spot between collision rate and parameter count.
num_buckets can be passed explicitly to override.
Source code in graphids/core/models/id_encoding/hash_embedding.py
lookup ¶
Dense lookup embedding with optional stochastic UNK-drop.
Default (p_unk_drop=0.0) reproduces the pre-refactor nn.Embedding
behavior byte-for-byte so existing single-vocab runs are a no-op change.
p_unk_drop > 0.0 implements the Stage 3 ablation arm from
~/plans/oov-embedding-handling.md: during training, each node_id is
remapped to UNK_INDEX with probability p, so the OOV row
receives gradient and attack-introduced IDs at inference land in a
trained slot instead of init noise.
supervised ¶
Supervised graph model family exports.
GAT ¶
GAT(*, loss_fn: Module | None = None, hidden: int | None = None, layers: int | None = None, heads: int | None = None, dropout: float = 0.2, fc_layers: int = 3, embedding_dim: int = 16, conv_type: str = 'gatv2', edge_dim: int = 11, pool_aggrs: list[str] | None = None, sequence_pool: Literal['auto', 'flat', 'mean', 'attention', 'gru'] = 'auto', proj_dim: int = 0, gradient_checkpointing: bool = True, compile_model: bool = False, id_encoder_cfg: IdEncodingCfg | None = None, id_encoder_class_path: str = 'graphids.core.models.id_encoding.LookupIdEncoder', id_encoder_kwargs: dict | None = None, lr: float = 0.001, weight_decay: float = 0.0001, scale: str = 'small', model_type: ModelType = 'gat', dataset: str = '', seed: int = 42, variational: bool = True, num_ids: int = 0, in_channels: int = 0, num_classes: int = 2)
Bases: GraphModuleBase
Collapsed GAT — arch + trainer-bridge in one nn.Module.
Loss selection is decoupled: loss_fn is an nn.Module built
from the experiment config. When the block resolves to a
:class:~graphids.core.losses.distillation.SoftLabelDistillation,
training automatically becomes a KD run — no branching here.
scale selects per-axis hyperparam presets from :attr:_SCALES;
explicit hidden / layers / heads kwargs (non-None)
override the preset.
Source code in graphids/core/models/supervised/gat.py
extract_features ¶
Per-graph fusion features as named tensors.
probs[N, 2] — prob_0, prob_1conf[N, 1] — 1 - entropy / log(2)emb_stats[N, 4] — emb_mean, emb_std, emb_max, emb_min
Source code in graphids/core/models/supervised/gat.py
gat ¶
GAT supervised classifier — collapsed arch + trainer-bridge.
The single :class:GAT class is both the architecture (InputEncoder +
conv stack + JK + pool + FC head) and the trainer-bridge
(training_step/validation_step/test_step, fusion-feature
extractor). No wrapper module — see
~/plans/graphids-collapse-model-modules.md Phase 3.
Supports GATConv (default), GATv2Conv, and TransformerConv via conv_type. TransformerConv natively uses edge_attr, enabling the 11-D edge features (frequency, temporal intervals, bidirectionality, degree products) that GATConv ignores.
GAT ¶
GAT(*, loss_fn: Module | None = None, hidden: int | None = None, layers: int | None = None, heads: int | None = None, dropout: float = 0.2, fc_layers: int = 3, embedding_dim: int = 16, conv_type: str = 'gatv2', edge_dim: int = 11, pool_aggrs: list[str] | None = None, sequence_pool: Literal['auto', 'flat', 'mean', 'attention', 'gru'] = 'auto', proj_dim: int = 0, gradient_checkpointing: bool = True, compile_model: bool = False, id_encoder_cfg: IdEncodingCfg | None = None, id_encoder_class_path: str = 'graphids.core.models.id_encoding.LookupIdEncoder', id_encoder_kwargs: dict | None = None, lr: float = 0.001, weight_decay: float = 0.0001, scale: str = 'small', model_type: ModelType = 'gat', dataset: str = '', seed: int = 42, variational: bool = True, num_ids: int = 0, in_channels: int = 0, num_classes: int = 2)
Bases: GraphModuleBase
Collapsed GAT — arch + trainer-bridge in one nn.Module.
Loss selection is decoupled: loss_fn is an nn.Module built
from the experiment config. When the block resolves to a
:class:~graphids.core.losses.distillation.SoftLabelDistillation,
training automatically becomes a KD run — no branching here.
scale selects per-axis hyperparam presets from :attr:_SCALES;
explicit hidden / layers / heads kwargs (non-None)
override the preset.
Source code in graphids/core/models/supervised/gat.py
extract_features ¶
Per-graph fusion features as named tensors.
probs[N, 2] — prob_0, prob_1conf[N, 1] — 1 - entropy / log(2)emb_stats[N, 4] — emb_mean, emb_std, emb_max, emb_min