Garden's Almanac of Matter Models

Datasets

The structure libraries from which these models descend.

The lineage of an MLIP is shaped at least as much by what it was trained on as by its architecture. Two equivariant transformers with nearly identical layer counts can produce wildly different relaxation paths if one was raised on QM9 and the other on OMat24. The clusters on the similarity map track training dataset more reliably than they track model family — and on most days, more reliably than the architecture diagrams in the original papers.

This section is a working reference, not a comprehensive review of materials and molecular chemistry datasets. The entries cover the libraries actually referenced by the catalog.

Name Domain Size DFT level Year
QM9 molecules ~134,000 small organic molecules B3LYP/6-31G(2df,p) 2014
MD17 / rMD17 molecules ~3.6M MD frames across 10 small organic molecules PBE+TS / CCSD(T) for select molecules 2017
ANI-1x / ANI-2x molecules ~5M (ANI-1x) and ~9M (ANI-2x) conformations of small organics ωB97X / 6-31G* 2018
MP-2018+ materials ~150k DFT-relaxed inorganic crystals (MP-2018 snapshot, growing) PBE; PBE+U for selected TM oxides; VASP 2018
OC20 catalysis ~1.3M relaxations, ~265M structure-energy pairs along the relaxation paths RPBE; VASP 2020
OC22 catalysis ~62k relaxations on oxide surfaces PBE+U; VASP 2022
MPTrj materials ~1.6M VASP relaxation frames extracted from Materials Project PBE; PBE+U for select TMs; VASP 2023
SPICE molecules ~1.1M conformations ωB97M-D3(BJ) / def2-TZVPPD 2023
OMat24 materials ~118M structures PBE; PBE+U for 3d transition metals 2024
sAlex materials ~4.2M inorganic structures, subsampled from the Alexandria database PBE; some PBE+U; VASP 2024
MAD mixed ~95M frames spanning crystals, surfaces, molecules, and disordered configurations PBE / r²SCAN (mixed-fidelity) 2025
OMol25 molecules ~110M molecular frames ωB97X-D / def2-TZVPD 2025

Sorted oldest first to read as a timeline of what models had to learn from.