Skip to content

Topology

Import-time coherence check for the jsonnet config tree (every model family has a libsonnet, every fusion method has a method libsonnet, every stage has a .jsonnet), plus path helpers and the dataset catalog loader. Failures raise at package load, not at sbatch time.

graphids.config.topology

topology

Config-tree validation + path/catalog helpers.

Import-time check that the jsonnet tree (model families, fusion methods, stage files) is coherent with the static axes, plus a small set of path helpers and the dataset catalog loader.

cache_dir

cache_dir(lake_root: str, dataset: str) -> Path

Path to preprocessed tensor cache. Pinned to :data:graphids.config.constants.PREPROCESSING_VERSION so a bump of the version forces rebuild without deleting the old tree.

Source code in graphids/config/topology.py
def cache_dir(lake_root: str, dataset: str) -> Path:
    """Path to preprocessed tensor cache. Pinned to
    :data:`graphids.config.constants.PREPROCESSING_VERSION` so a bump
    of the version forces rebuild without deleting the old tree.
    """
    return Path(lake_root) / "cache" / f"v{PREPROCESSING_VERSION}" / dataset

data_dir

data_dir(lake_root: str, dataset: str) -> Path

Path to raw CSVs for a dataset: {lake_root}/raw/{dataset}.

Source code in graphids/config/topology.py
def data_dir(lake_root: str, dataset: str) -> Path:
    """Path to raw CSVs for a dataset: ``{lake_root}/raw/{dataset}``."""
    return Path(lake_root) / "raw" / dataset

dataset_names

dataset_names() -> list[str]

Public dataset names — entries starting with _ are internal placeholders (test fixtures, retired datasets) and excluded.

Source code in graphids/config/topology.py
def dataset_names() -> list[str]:
    """Public dataset names — entries starting with ``_`` are internal
    placeholders (test fixtures, retired datasets) and excluded.
    """
    return [k for k in load_catalog() if not k.startswith("_")]

load_catalog cached

load_catalog() -> dict[str, dict[str, Any]]

Flat view of configs/datasets/dataset_registry.json{dataset_name: {name, domain, **entry}}. Cached once per process; domains collapse into a domain field on each entry.

Source code in graphids/config/topology.py
@lru_cache(maxsize=1)
def load_catalog() -> dict[str, dict[str, Any]]:
    """Flat view of ``configs/datasets/dataset_registry.json`` —
    ``{dataset_name: {name, domain, **entry}}``. Cached once per
    process; domains collapse into a ``domain`` field on each entry.
    """
    if not DATASET_REGISTRY_PATH.is_file():
        raise FileNotFoundError(f"Dataset registry missing: {DATASET_REGISTRY_PATH}")
    registry = json.loads(DATASET_REGISTRY_PATH.read_text())
    return {
        name: {"name": name, "domain": domain, **entry}
        for domain, datasets in registry.items()
        if isinstance(datasets, dict)
        for name, entry in datasets.items()
    }