MLX-vs-PyTorch

This repository contains benchmarks for comparing two popular artificial intelligence frameworks that work on Apple Silicon devices: MLX and PyTorch.

The idea behind this simple project is to enable a wise choice when starting an AI project on an Apple computer.

We ran five benchmarks several times to emulate a day-to-day usage. For more information about them, please refer to section Details about each benchmark.

Training a transformers language model (lm_train.py).
Training/fine-tuning BERT (bert_fine_tune.py).
Inference using OpenAI's whisper model (whisper_inference.py).
Language model inference using TinyLLama (llm_inference.py).
A synthetic benchmark that moves data between CPU and GPU for matrix multiplication (switch_test.py).

Results

We executed the tests for ten iterations each, except the language model training and the BERT training ones, for which we ran only three iterations due to the extra time they took.

The results on the tables below show the average time for the iterations we ran. For information about the median of the execution times for each benchmark, refer to raw_results.txt.

M1 Pro (10 CPU core, 16 GPU core, 32 GB RAM)
Benchmark	PyTorch time (s)	MLX time (s)
Training a transformer language model	1806.63	1157.00
Training BERT	751.02	718.35
Whisper inference	31.99	8.50
TinyLLama inference	59.27	33.38
CPU/GPU switch	349.72	270.15

M1 Max (10 CPU core, 32 GPU core, 64 GB RAM)
Benchmark	PyTorch time (s)	MLX time (s)
Training a transformer language model	1106.75	752.25
Training BERT	793.67	499.34
Whisper inference	21.28	6.95
TinyLLama inference	50.98	20.61
CPU/GPU switch	251.71	214.57

M3 Max (16 CPU core, 40 GPU core, 48 GB RAM)
Benchmark	PyTorch time (s)	MLX time (s)
Training a transformer language model	912.52	426.00
Training BERT	550.29	408.45
Whisper inference	17.90	4.85
TinyLLama inference	36.18	15.41
CPU/GPU switch	146.35	140.51

How to run the benchmarks

First, make sure you have git LFS installed so that you can configure your repository:

pip3 install -r requirements.txt
cd pytorch_models
./configure.sh
cd .. 
cd mlx_models
./configure.sh

Every Python file in the root folder represents a different benchmark. All of them require two arguments: the number of times to run the benchmark and the framework. If you'd like to run, for example, the TinyLLama inference benchmark ten times using PyTorch, execute:

python3 llm_inference.py --framework pytorch --iter 10

When the command finishes, it will print on the terminal the average and median times of the ten iterations.

Additional settings

The lm_train.py benchmark needs the PYTORCH_MPS_HIGH_WATERMARK_RATIO environment variable set to zero when used with PyTorch.

The whisper_inference benchmark only works with the latest commit from the PyTorch repository, so build it from sources to run this benchmark.

Details about each benchmark

Training a transformers langauge model

For this benchmark, we copied the model from MLX's TransformerLM example. For the PyTorch version, we utilized the closest functions available to properly replicate the model in another framework. The dataset utilized is the PTB corpus. For more information about the model size, epochs and other hyperparameters, refer to lm_train.py.

Training/fine-tuning BERT

We utilized the model presented in Conneau et al, using the BERT-tiny model for the respective BERT blocks. It classifies pairs of sentences as having a contradiction, entailment or neutral relation. It was implemented in pure PyTorch and pure MLX respectively. We do not initialize it with any pre-trained weights, so the benchmark can be seen as pure training. The dataset for training was the NLI dataset.

The only adaptation in this case was that we used PyTorch dataloader for the MLX model too, as it was compatible with the tokenizer library. Even though the data loader creates a PyTorch tensor for each input, we can transform it to a numpy array without extra copies, so this setting did not harm the MLX results.

Whisper inference

For the PyTorch setting, we used HuggingFace transformers library to download and execute the tiny whisper model. For the MLX benchmark, we used the MLX examples tools to download tiny whisper and convert it to the MLX format, using float32 as the inner data type to match that of PyTorch (see mlx_models/configure.sh). The inference code for MLX leverages the mlx_whisper library.

TinyLLama inference

For PyTorch, we downloaded the TinyLlama-1.1B-Chat-v1.0 model from the HuggingFace repository (see pytorch_models/configure.sh), and use the transformers library to load and execute the model.

For MLX, we convert the model to the MLX format using the MLX examples tools, and use float32 as the data type to match that of PyTorch. We utilize the execution script from the MLX examples repository with several adaptations to account for the proper prompt formatting and execution constraints.

CPU/GPU switch

In this benchmark, we perform matrix multiplications in a loop. First, we multiply matrices in the CPU, then we multiply the resulting matrices in the GPU. Lastly, we reuse the results from the latter as the input for the next iteration's CPU multiplication.

The idea behind this benchmark is to assess how effective each framework's mechanisms are to move data between execution units.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLX-vs-PyTorch

Results

How to run the benchmarks

Additional settings

Details about each benchmark

Training a transformers langauge model

Training/fine-tuning BERT

Whisper inference

TinyLLama inference

CPU/GPU switch

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
dataset		dataset
mlx_models		mlx_models
pytorch_models		pytorch_models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bert_fine_tune.py		bert_fine_tune.py
llm_inference.py		llm_inference.py
lm_train.py		lm_train.py
raw_results.txt		raw_results.txt
requirements.txt		requirements.txt
switch_test.py		switch_test.py
whisper_inference.py		whisper_inference.py

License

LucasSte/MLX-vs-Pytorch

Folders and files

Latest commit

History

Repository files navigation

MLX-vs-PyTorch

Results

How to run the benchmarks

Additional settings

Details about each benchmark

Training a transformers langauge model

Training/fine-tuning BERT

Whisper inference

TinyLLama inference

CPU/GPU switch

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages