Skip to content

Commit 07b41d7

Browse files
QET FP dataset with READMD.md added(#743)
* Adding README.md for PyG TensorNet * added missing TrajectoryObserver * fix ruff * improve unit tests * convert float into Tensor * Corrected expected values * Corrected expected values again * update torch<=2.9.1 * Convert PyG TensorNet Embedding and Interaction blocks into pure Torch * Pure Pytorch TensorNet for MLIPs is added * Fix united tests * Refactor the PyG TensorNet components to ensure compatibility with pure PyTorch. * Cleanup MGLDataset * Improve the handling of stress unit in PESCalculator * cleanup test_ase_pyg.py * update PESCalculator unit tests * Improve logging in PESCalculator * fix linting tests * fixed linting and united tests * include_ref_charge keyword is added in MGLDataset * Update Relaxations and Simulations using the QET Universal Potential.ipynb Signed-off-by: Tsz Wai Ko <47970742+kenko911@users.noreply.github.com> * fix ruff * avoid crashing in MatCalc united tests by adding calc_charge and compute_charge in PyG Potential and PESCalculator * QET training module added * fix united tests for QET training * make sure predicted and target charges into the same dimension during the training * added backed the self.charge_weight * change torch.vstack into torch.hstack for reference charges * fixed the unit test * fix the predict_structure function in QET model * example notebook for QET potential training * added QET FP dataset with README * pre-commit auto-fixes * Clarify units for QET FP dataset in README Added units of measurement for length, energy, force, stress, and charge. Signed-off-by: Tsz Wai Ko <47970742+kenko911@users.noreply.github.com> --------- Signed-off-by: Tsz Wai Ko <47970742+kenko911@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 966672b commit 07b41d7

File tree

2 files changed

+38
-0
lines changed

2 files changed

+38
-0
lines changed
82.9 MB
Binary file not shown.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
## MatQ Dataset
2+
3+
The **MatQ dataset** comprises both **near-equilibrium** and **out-of-equilibrium** structures designed to provide broad coverage of the potential energy surface (PES) of crystalline materials.
4+
5+
### Structure Generation
6+
7+
The initial pool of **6,652,874 near-equilibrium structures** was generated from Materials Project crystals by applying strains of **±2%, ±4%, and ±6%** along all crystallographic directions.
8+
9+
The **out-of-equilibrium structures** were obtained from **1000 K and 3000 K NVT/NPT molecular dynamics (MD) trajectories** in the **OMat24 validation set**.
10+
11+
To reduce redundancy while maintaining coverage of the configuration space, the **DIRECT (Dimensionality-Reduced Encoded Clusters with sTratified sampling)** method was applied. This approach selects representative structures using dimensionality reduction and stratified sampling, enabling efficient exploration of the configuration space while minimizing overlap.
12+
13+
Using DIRECT sampling, **60,000 structures** were selected from both the near-equilibrium and out-of-equilibrium pools. This significantly reduces the computational cost of generating high-quality reference data while preserving the diversity of structural configurations.
14+
15+
### DFT Calculations
16+
17+
All structures were evaluated using **spin-polarized density functional theory (DFT)** calculations with the **Vienna Ab initio Simulation Package (VASP)**.
18+
19+
The calculations used the **Perdew–Burke–Ernzerhof (PBE)** generalized gradient approximation (GGA) to describe exchange–correlation interactions.
20+
21+
Input files were generated using the **`MatPESStaticSet`** workflow implemented in **pymatgen**, which has been carefully benchmarked to ensure well-converged potential energy surface properties.
22+
23+
The main computational parameters are:
24+
25+
- **Plane-wave energy cutoff:** 680 eV
26+
- **k-point spacing:** 0.35 Å⁻¹
27+
- **Electronic convergence criterion:** 1×10⁻⁵ eV
28+
29+
Atomic charges were computed using the **DDEC6 charge partitioning scheme** implemented in **Chargemol (version 09_26_2017)** from the DFT charge densities.
30+
31+
### Final Dataset
32+
33+
The final **MatQ dataset contains 114,445 structures**, after excluding:
34+
35+
- unconverged DFT calculations
36+
- structures with extremely large force components (**|Fₓ, Fᵧ, F_z| > 50 eV/Å**)
37+
38+
The units of length, energy, force, stress, and charge are Å, eV, eV/Å, eV/ų, and e, respectively.

0 commit comments

Comments
 (0)