Skip to content

Commit

Permalink
update cpu benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
jiqing-feng committed Sep 12, 2024
1 parent e959717 commit 6eff85c
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions docs/source/non_cuda_backends.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@ Thank you for your support!

### Intel

The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show latency and memory compared with different data types of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
The following performance data is collected from Intel 4th Gen Xeon (SPR) platform. The tables show speed-up and memory compared with different data types of [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).

For inference:
| | BF16 | INT8 | NF4 | FP4 |
| latency | 1.0x | 0.62x | 2.43x | 0.02x |
| memory | 13.1G | 7.2G | 5.7G | 4.6G |
| | BF16 | INT8 | NF4 | FP4 |
| speed-up | 1.0x | 0.6x | 2.3x | 0.03x |
| memory | 13.1G | 7.6G | 5.0G | 4.6G |

For fine-tune:
| | BF16 | INT8 | NF4 | FP4 |
| time-to-train | 1.0x | 0.36x | 0.07x | 0.07x |
| memory | 40G | 8.6G | 5.2G | 5.2G |
| | AMP BF16 | INT8 | NF4 | FP4 |
| speed-up | 1.0x | 0.38x | 0.07x | 0.07x |
| memory | 40G | 9G | 6.6G | 6.6G |


### AMD

0 comments on commit 6eff85c

Please sign in to comment.