Architectures

A short field guide to the families of models in the catalog.

A working vocabulary for reading the rest of the Almanac, not a tutorial. Representational similarity tracks training data more reliably than architecture (per the similarity map), but architecture still shapes what a model can learn. The five families below are the ones the catalog uses.

Attention / Graph-Transformer

Self-attention transformers for atoms — neighborhoods, and lately whole systems, are attended over rather than message-passed.

Attention-based potentials replace fixed message-passing aggregation with learned self-attention: each atom weighs its neighbors — and, in the all-to-all variants, every other atom in the system — by content rather than by a hand-chosen edge function. The lineage borrows directly from the transformer architectures of language and vision, adapted to operate on invariant scalar features for throughput rather than enforcing equivariance by construction. The headline references are Qu et al., EScAIP (Efficiently Scaled Attention Interatomic Potential) and the all-to-all node-attention follow-up AllScAIP, which adds a global attention component to recover long-range accuracy. This family overlaps with the Vanilla / Direct-Prediction cluster — both forgo hard equivariance for speed — and with Message-Passing, since attention is a learned generalisation of neighbor aggregation. We list it separately because the scaling behaviour and the all-to-all long-range mechanism are distinctive enough to track on their own.

Models in this family

AllScAIP

2026

—

Equivariant Models

Models that respect the symmetries of three-dimensional space by construction.

Equivariant networks build SO(3) rotational symmetry directly into their layers, propagating tensor representations of various angular-momentum orders rather than relying on data augmentation to teach the model that rotated molecules are still molecules. The promise is precise: predictions transform correctly under rotation by construction, not by training. Canonical examples are Batzner et al., NequIP and Liao et al., EquiformerV2. The tradeoff is cost: the operations that preserve equivariance are more expensive than their unconstrained counterparts, and large models in this family demand significant GPU memory.

Models in this family

PaiNN	2021	20.1M
SCN	2022	168M
TorchMD-Net	2022	—
EquiformerV2	2023	31M–153M
eSCN	2023	200M
TensorNet	2023	0.8M
eSEN	2025	6.3M–50.7M
UMA	2025	150M–1.4B

Atomic Cluster Expansion

A specific equivariant formalism, expanding atomic environments into a body-ordered basis.

ACE-based models are technically equivariant but represent a distinct lineage with their own basis functions, body-order construction, and training conventions. The original formulation is Drautz, Atomic Cluster Expansion (2019); the modern message-passing variant is Kovács et al., MACE. The catalog's ACE cluster on the Map (NequIP, Allegro, MACE) reflects this lineage; readers will see overlap with the Equivariant family above and that overlap is real. We list ACE separately because the embedding clusters do — and because, in practice, the basis choice matters more than the equivariant label suggests.

Models in this family

MACE-OFF23	2023	4.7M
MACE-MP-0	2024	3.8M–5.7M
MACE-POLAR-1	2026	—

Message-Passing Neural Networks

The earlier and more general family — atoms exchange messages with neighbors over multiple rounds.

Message-passing networks treat atoms as graph nodes and propagate information along bonds (or near-neighbor edges) over several rounds of aggregation. The earliest broadly-adopted example is Schütt et al., SchNet; later refinements added directional information (DimeNet, GemNet) and equivariant message channels (PaiNN, TorchMD-Net, GemNet-OC), yielding a continuum from this family to the equivariant successors above. The headline reference for the modern direction is Gasteiger et al., GemNet. Cheap, well-understood, and the workhorse for everything from QM9 to large-scale catalysis.

Models in this family

SchNet	2018	9.1M
DimeNet++	2020	1.8M
GemNet	2022	32M–168M
CHGNet	2023	0.4M
MatterSim	2024	1M–5M

Vanilla / Direct-Prediction

Architectures that don't impose equivariance as a hard constraint.

Direct-prediction models trade the equivariance guarantee for raw throughput. The headline example is the Orb family, which uses an equigrad training-time regularizer to encourage rotational consistency without enforcing it in the architecture. See Neumann et al., Orb and the v3 follow-up Rhodes et al., Orb-v3. What's gained is wall-clock speed (often 2–4× over equivariant siblings on the same hardware); what's risked is sporadic violations of physical consistency that can surface at long simulation timescales. Worth the tradeoff for many screening-scale workloads, less defensible for production MD.

Models in this family

ANI	2020	—
Orb-v2	2024	25M
Orb-v3	2025	25.5M

Earlier Generations

Architectures from before the current scale era — included for historical orientation.

Some catalog entries come from a period when MLIPs were small, single-purpose, and trained on one chemistry domain at a time. SchNet, DimeNet++, GemNet-T, and ANI-2x are listed here as historical anchors. Their inclusion in the catalog is partly archival and partly practical: they remain the most widely cited baselines, and recent papers continue to benchmark against them. This capsule overlaps with the Message-Passing family above and that overlap is intentional — the lineage matters, even when the architecture name is the same.

Models in this family

— no catalog entries yet —