Data Architecture¶

This is the current GraphIDS data layout after the preprocessing refactor.

1. Raw storage¶

Source of truth: immutable CAN/CPS rows.

Typical fields:

vehicle_id
timestamp
arb_id
payload
attack
attack_type
provenance fields from the source

Code surface:

graphids/core/data/datasets/can_bus.py
graphids/core/data/datasets/_base.py

2. Representations¶

The primary public representation kinds are:

snapshot
snapshot_sequence
multi_scale
temporal
entity

Representation configs live in:

graphids/core/data/preprocessing/representations.py

They bridge to:

view configs
segment configs
temporal stream specs

3. Materialized views¶

Training-facing materializations are derived from raw storage through the selected representation.

Examples:

snapshot graphs
snapshot sequences
multi-scale views
temporal streams
entity-centric views

Code surface:

graphids/core/data/preprocessing/views.py
graphids/core/data/preprocessing/segments.py
graphids/core/data/preprocessing/materialization.py
graphids/core/data/preprocessing/pyg.py
graphids/core/data/preprocessing/temporal.py

4. Discovery and hypotheses¶

This layer stores signal profiles and provisional canonical mappings. It is where hidden-DBC cross-vehicle alignment lives.

Typical records:

raw signal profile tables
canonical hypotheses
confidence
evidence
provenance

Code surface:

graphids/core/data/discovery/hypotheses.py
graphids/core/data/discovery/canonical.py
graphids/core/data/discovery/layout.py

5. Selection rule¶

The primary user-facing control surface is now:

representation_cfg

Window sizes and strides are resolved from the representation config at the pipeline boundary, which derives an explicit segment config before materialization.

Data Architecture¶

1. Raw storage¶

2. Representations¶

3. Materialized views¶

4. Discovery and hypotheses¶

5. Selection rule¶

6. Training flow¶