Architecture family

Attention / Graph-Transformer

Self-attention transformers for atoms — neighborhoods, and lately whole systems, are attended over rather than message-passed.

Attention-based potentials replace fixed message-passing aggregation with learned self-attention: each atom weighs its neighbors — and, in the all-to-all variants, every other atom in the system — by content rather than by a hand-chosen edge function. The lineage borrows directly from the transformer architectures of language and vision, adapted to operate on invariant scalar features for throughput rather than enforcing equivariance by construction. The headline references are Qu et al., <i>EScAIP</i> (Efficiently Scaled Attention Interatomic Potential) and the all-to-all node-attention follow-up <i>AllScAIP</i>, which adds a global attention component to recover long-range accuracy. This family overlaps with the Vanilla / Direct-Prediction cluster — both forgo hard equivariance for speed — and with Message-Passing, since attention is a learned generalisation of neighbor aggregation. We list it separately because the scaling behaviour and the all-to-all long-range mechanism are distinctive enough to track on their own.