Skip to content

Commit 5286464

Browse files
authored
OC20 multi-ads dataset (#1626)
* multi_ads placeholder docs * add oc20-mads download link
1 parent 2a54426 commit 5286464

2 files changed

Lines changed: 22 additions & 0 deletions

File tree

docs/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ parts:
5151
- file: catalysts/datasets/summary
5252
sections:
5353
- file: catalysts/datasets/oc20
54+
- file: catalysts/datasets/oc20mads
5455
- file: catalysts/datasets/oc22
5556
- file: catalysts/datasets/oc20dense
5657
- file: catalysts/datasets/oc20neb
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
2+
# Open Catalyst 2020 Multi-Adsorbate (mAds) Dataset
3+
4+
## Overview
5+
The OC20-mAds dataset is a training set expanding the original OC20 dataset to include multi-adsorbate and coverage effects on catalyst surfaces. Adsorbates are randomly sampled from the list of OC20 adsorbates, up to 5 maximum adsorbates. For a small fraction of the dataset, all adsorbates on the surface may be identical. OC20-mAds is introduced in the [UMA paper](https://arxiv.org/pdf/2506.23971).
6+
## File Contents and Download
7+
|Splits |Size | MD5 checksum (download link) |
8+
|--- |--- |--- |
9+
|Train | 21,804,758 | [6435960ba5ad1a7c949bd2f2b51825bc](https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20mAds/oc20_multiads_train.tar.gz) |
10+
11+
The following metadata can be accessed in the respective `atoms.info` entry:
12+
13+
- `bulk_id`: Bulk identifier
14+
- `millers`: 3-tuple of integers indicating the Miller indices of the surface.
15+
- `shift`: C-direction shift used to determine cutoff for the surface (c-direction is following the nomenclature from Pymatgen).
16+
- `top`: Boolean indicating whether the chosen surface was at the top or bottom of the originally enumerated surface.
17+
- `adsorbates`: List of adsorbates sampled and their respective placements.
18+
- `sid`: Unique system identifier.
19+
- `fid`: Frame index along the relaxation/AIMD trajectory.
20+
- `results_path`: Internal results location.
21+
- `fmax`: Max per-atom force.

0 commit comments

Comments
 (0)