A short field guide to the families of models in the catalog.
A working vocabulary for reading the rest of the Almanac, not a tutorial. Representational similarity
tracks training data more reliably than architecture (per the
similarity map), but architecture still shapes what a model
can learn. The five families below are the ones the catalog uses.
Attention / Graph-Transformer
Self-attention transformers for atoms — neighborhoods, and lately whole systems, are attended over rather than message-passed.
Attention-based potentials replace fixed message-passing aggregation with learned self-attention: each atom weighs its neighbors — and, in the all-to-all variants, every other atom in the system — by content rather than by a hand-chosen edge function. The lineage borrows directly from the transformer architectures of language and vision, adapted to operate on invariant scalar features for throughput rather than enforcing equivariance by construction. The headline references are Qu et al., EScAIP (Efficiently Scaled Attention Interatomic Potential) and the all-to-all node-attention follow-up AllScAIP, which adds a global attention component to recover long-range accuracy. This family overlaps with the Vanilla / Direct-Prediction cluster — both forgo hard equivariance for speed — and with Message-Passing, since attention is a learned generalisation of neighbor aggregation. We list it separately because the scaling behaviour and the all-to-all long-range mechanism are distinctive enough to track on their own.
Models that respect the symmetries of three-dimensional space by construction.
Equivariant networks build SO(3) rotational symmetry directly into their layers, propagating tensor representations of various angular-momentum orders rather than relying on data augmentation to teach the model that rotated molecules are still molecules. The promise is precise: predictions transform correctly under rotation by construction, not by training. Canonical examples are Batzner et al., NequIP and Liao et al., EquiformerV2. The tradeoff is cost: the operations that preserve equivariance are more expensive than their unconstrained counterparts, and large models in this family demand significant GPU memory.
A specific equivariant formalism, expanding atomic environments into a body-ordered basis.
ACE-based models are technically equivariant but represent a distinct lineage with their own basis functions, body-order construction, and training conventions. The original formulation is Drautz, Atomic Cluster Expansion (2019); the modern message-passing variant is Kovács et al., MACE. The catalog's ACE cluster on the Map (NequIP, Allegro, MACE) reflects this lineage; readers will see overlap with the Equivariant family above and that overlap is real. We list ACE separately because the embedding clusters do — and because, in practice, the basis choice matters more than the equivariant label suggests.
The earlier and more general family — atoms exchange messages with neighbors over multiple rounds.
Message-passing networks treat atoms as graph nodes and propagate information along bonds (or near-neighbor edges) over several rounds of aggregation. The earliest broadly-adopted example is Schütt et al., SchNet; later refinements added directional information (DimeNet, GemNet) and equivariant message channels (PaiNN, TorchMD-Net, GemNet-OC), yielding a continuum from this family to the equivariant successors above. The headline reference for the modern direction is Gasteiger et al., GemNet. Cheap, well-understood, and the workhorse for everything from QM9 to large-scale catalysis.
Architectures that don't impose equivariance as a hard constraint.
Direct-prediction models trade the equivariance guarantee for raw throughput. The headline example is the Orb family, which uses an equigrad training-time regularizer to encourage rotational consistency without enforcing it in the architecture. See Neumann et al., Orb and the v3 follow-up Rhodes et al., Orb-v3. What's gained is wall-clock speed (often 2–4× over equivariant siblings on the same hardware); what's risked is sporadic violations of physical consistency that can surface at long simulation timescales. Worth the tradeoff for many screening-scale workloads, less defensible for production MD.
Architectures from before the current scale era — included for historical orientation.
Some catalog entries come from a period when MLIPs were small, single-purpose, and trained on one chemistry domain at a time. SchNet, DimeNet++, GemNet-T, and ANI-2x are listed here as historical anchors. Their inclusion in the catalog is partly archival and partly practical: they remain the most widely cited baselines, and recent papers continue to benchmark against them. This capsule overlaps with the Message-Passing family above and that overlap is intentional — the lineage matters, even when the architecture name is the same.