This repository contains the supplementary code to "Calibration, validation, and evaluation of machine learning thermobarometers in metamorphic petrology: an application to biotite and outlook for future strategy" by Hartmeier et al. (2025).
The data and figures in the manuscript can be reproduced using the jupyter notebooks in th is repository. The most important results and their corresponding notebook are listed here.
- Compositional variation of natural biotite (Figure 1)
- Compositional variation of natural biotite in P-T space (Figure 3)
- Compositional variation of biotite generated using PEM in P-T space (Figure 4)
-
Model trained on natural data (used in Tables 2 and 3, and Figures 5, 6, 7, 8, 9 and 10):
-
S1 Feature Selection (Figures S1.1-1.2):
-
S2 Architecture and Hyperparameter Tuning (Figures S2.1-2.3):
Data:
-
Biotite data from mineral assemblage sequences, based on the database of Pattison and Forshaw (2025, in review).
- Biotite data set: Used to calibrate the thermobarometer.
- Biotite data set, reduced to analyses with measured Na: Used in feature engineering.
- Biotite data set, latest version of Pattison and Forshaw (v2025-February). Used to test whether recent updates to the database have resulted in a significant change in model performance.
-
K-fold data. CSVs of 5-fold training and validation splits used during cross-validation.
- K-fold biotite data: Used to calibrate the thermobarometer.
- K-fold biotite data, latest version of Pattison and Forshaw (v2025-February). Used to test whether recent updates to the database have resulted in a significant change in model performance.
-
Training logs, log files of training and validation performance for all models calibrated.
-
Saved models, trained models saved in tensorflow's SavedModel format.
-
Model trained on PEM data (used for the transfer learning):
-
S2 Architecture and Hyperparameter Tuning (Figures S2.4):
Data:
-
Training logs, log files of training and validation performance for all models calibrated.
-
Saved models, trained models saved in tensorflow's SavedModel format.
-
Data sets generated using phase equilibrium modelling are available upon request.
-
Training of the single crystal biotite thermobarometer ("final" model evaluated and applied in the paper):
-
Model trained using transfer learning (k models for cross-validation):
- Transfer learning model using the prior model trained with ds55 and the solution models of White et al. (2007)
- Transfer learning model using the prior model trained with ds55 and the solution models of Tajcmanova et al. (2009)
- Transfer learning model using the prior model trained with ds62 and the solution models of White et al. (2014)
-
S2 Architecture and Hyperparameter Tuning (Figures S2.5):
Data:
-
Training logs, log files of training and validation performance for all models calibrated.
-
Saved models, trained models saved in tensorflow's SavedModel format.
-
K-fold cross-validation using model M1, M2a/b, M3a-c (Table 1):
- Model validation RMSE (Figure 5) and RMSE for different P-/T-bins (Figure 6)
- Additional model validation, compare if there is a significant effect of training a model equivalent to M2a and M3c on data from an updated version of the Pattison and Forshaw database (v2025Feb) compared to (v2024Feb).
-
Validation using metapelitic sequences:
- Validation using sequences (Figure 7 and 8)
- Data: Validation dataset of metapelitic sequences
-
Validation using Monte-Carlo error propagation:
- Validation using MC error propagation (Figure 9 and 10)
- Performance evaluation on the test dataset (Figure 12)
- Comparison with Ti-in-biotite thermometry (Figure 13)
Data:
- Test dataset
- Compositional maps (hdf5 format) are available upon request.
As the initial parameterisation and gradient descent optimisation are stochastic processes, the training of a neural network is not fully reproducible.
Therefore, it is not recommended to re-run the scripts used to train the models, as this will overwrite the original calibration of the neural network used in the work presented here. The purpose of these scripts is solely to document the training procedure and can be copied as a template to fit other new neural networks.
To experiment with the models calibrated here, they can be loaded from the saved_models directories provided.
The code in this repo is depending on Keras2 and will no longer work with keras >3.x. Therefore tensorflow is limited to v2.15.0 as this was the the final release before the launch of Keras 3.0.
This introduces some important hard dependencies itself:
- NumPy is limited to
<2.0, which then imposes - Python
<3.12, tested with3.11(recommended).
The full dependencies are specified in the pyproject.toml and poetry.lock file. Checkout poetry to make use of the lock file to reproduce the virtual env used to generate all results presented here exactly.
CAUTION: The poetry installation of tensorflow, seems to fail silently. The installation using poetry install runs without error, but tensorflow will not be installed and cannot be called. As a work around tensorflow was installed afterwards using pip.
source .venv/bin/activate
pip install tensorflow==2.15followed by (to install the ml_tb project locally)
poetry install