Pipeline Data Flow¶
This page is a compatibility note, not the primary architecture spec.
The current canonical overview lives in:
Legacy training details that still matter:
- raw CAN rows are normalized, parsed, and cached before graph materialization
- graph materialization receives an explicit segment config derived from
representation_cfgat the pipeline boundary - the runtime loader still uses budget-aware batching for variable-size graphs
What changed:
- representation config is now the primary user-facing surface
- raw storage, materialized views, and discovery/hypothesis data are split
- snapshot, temporal, multi-scale, sequence, and entity are explicit representations