[CI-Examples] Add Candle ML framework example #1938

dimakuv · 2024-07-10T14:58:48Z

Description of the changes

Candle is a minimalist ML framework for Rust with a focus on performance and ease of use. This PR adds two examples with Candle: simple matrix multiplication (to quickly test functionality) and Quantized LLaMA (to test performance).

How to test this PR?

Follow the README.

This change is

This is e.g. required by the gemm-common Rust crate, see `gemm-common/src/cache.rs`. Without this file, the crate logic incorrectly calculates shared-cpu count as zero and leads to a division-by-zero exception. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

dimakuv

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
#1937 is a prerequisite. Blocking.

dimakuv

Reviewable status: 0 of 6 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
We need to decide where to put this example (and if we want this example at all). Most probably it should go to the separate Examples repo? Don't know.

a discussion (no related file):
I have two examples, not sure if both are needed. If we decide to leave only one, then I would prefer the Quantized LLaMA one, because it is much more complex and can be used for benchmarking.

CI-Examples/candle/Makefile line 25 at r1 (raw file):

	mkdir -p $(SRCDIR) && cd $(SRCDIR) && \
		cargo new candle_matmul && cd candle_matmul && \
		cargo add --git https://github.com/huggingface/candle.git candle-core && \

I hard-coded all URLs and SHA256 hashes for now, I'm not sure if it's worth to make them variables.

dimakuv

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

CI-Examples/candle/candle_quantized.manifest.template line 8 at r1 (raw file):

loader.log_level = "{{ log_level }}"

loader.env.LD_LIBRARY_PATH = "/lib:{{ arch_libdir }}"

Must add RAYON_NUM_THREADS as a passthrough envvar, so that users can change the number of threads to run.

CI-Examples/candle/candle_quantized.manifest.template line 25 at r1 (raw file):

sgx.edmm_enable = {{ 'true' if env.get('EDMM', '0') == '1' else 'false' }}
sgx.max_threads = {{ '1' if env.get('EDMM', '0') == '1' else '256' }}
sgx.enclave_size = "16G"

Need to bump to "32G". The original workload takes up to 5.5GB, and the enclave with 16GB (minus the ASLR adjustments, minus Gramine's internal state) may error out with ENOMEM.

dimakuv

Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
Quick benchmark results for candle_quantized (LLAMA2 7B). Collected on a powerful SPR machine with 2 NUMA nodes and 72 physical cores (i.e. 36 physical cores on each node). Running the workload with 36 threads and pinned to 36 physical cores on NUMA node 0. SGX PRM (basically EPC) is configured with 32GB on each NUMA node.

Original workload:

~/gramine/CI-Examples/candle$ RAYON_NUM_THREADS=36 numactl --cpunodebind=0 --membind=0 \
    ./candle_quantized --model llama-2-7b.ggmlv3.q4_0.bin --tokenizer tokenizer.json --sample-len 200
avx: true, neon: false, simd128: false, f16c: true
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 291 tensors (3.79GB) in 3.46s
params: HParams { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, ftype: 2 }
model built
...
   6 prompt tokens processed: 9.50 token/s
 199 tokens generated: 6.07 token/s

gramine-direct:

~/gramine/CI-Examples/candle$ RAYON_NUM_THREADS=36 numactl --cpunodebind=0 --membind=0 \
    gramine-direct ./candle_quantized
avx: true, neon: false, simd128: false, f16c: true
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 291 tensors (3.79GB) in 3.17s
params: HParams { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, ftype: 2 }
model built
...
   6 prompt tokens processed: 3.01 token/s
 199 tokens generated: 1.56 token/s

gramine-sgx, no EDMM:

~/gramine/CI-Examples/candle$ RAYON_NUM_THREADS=36 numactl --cpunodebind=0 --membind=0 \
    gramine-sgx ./candle_quantized
avx: true, neon: false, simd128: false, f16c: true
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 291 tensors (3.79GB) in 27.87s
params: HParams { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, ftype: 2 }
model built
...
   6 prompt tokens processed: 2.36 token/s
 199 tokens generated: 6.83 token/s

gramine-sgx, with EDMM:

~/gramine/CI-Examples/candle$ RAYON_NUM_THREADS=36 numactl --cpunodebind=0 --membind=0 \
    gramine-sgx ./candle_quantized
avx: true, neon: false, simd128: false, f16c: true
temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded 291 tensors (3.79GB) in 43.01s
params: HParams { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, ftype: 2 }
model built
...
   6 prompt tokens processed: 0.07 token/s
 199 tokens generated: 5.87 token/s

To be honest, I don't know how to interpret these.

CI-Examples/candle/candle_quantized.manifest.template line 8 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Must add RAYON_NUM_THREADS as a passthrough envvar, so that users can change the number of threads to run.

Done

CI-Examples/candle/candle_quantized.manifest.template line 25 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Need to bump to "32G". The original workload takes up to 5.5GB, and the enclave with 16GB (minus the ASLR adjustments, minus Gramine's internal state) may error out with ENOMEM.

Done

Candle is a minimalist ML framework for Rust with a focus on performance and ease of use. This commit adds two examples with Candle: simple matrix multiplication (to quickly test functionality) and Quantized LLaMA (to test performance). Signed-off-by: Dmitrii Kuvaiskii <[email protected]>

kailun-qin

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):
Maybe better to include it in our examples repo (instead of CI-Examples) so that it will be tested before each release (instead of every time)?

mkow

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We need to decide where to put this example (and if we want this example at all). Most probably it should go to the separate Examples repo? Don't know.

I'm against it, I think it's not popular enough to justify the burden of maintaining it (but I'll be happy to change my mind if you prove that it's actually popular). Maybe Examples repo would be better, assuming someone wants to maintain it there.

monavij · 2024-07-21T02:28:45Z

I'm against it, I think it's not popular enough to justify the burden of maintaining it (but I'll be happy to change my mind if you prove that it's actually popular). Maybe Examples repo would be better, assuming someone wants to maintain it there.

One reason is that its a good rust example and we don't have one in Examples. One way to make something popular is to use them, especially if we like them :-)

mkow

We already have an example in Rust: https://github.com/gramineproject/gramine/tree/master/CI-Examples/rust.

Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

dimakuv · 2024-07-26T06:16:51Z

Closing this PR. The fix to Gramine LibOS was merged in this repo, and the Candle example itself was moved to another repo: gramineproject/examples#104

dimakuv mentioned this pull request Jul 10, 2024

[CI-Examples] Add Candle ML framework example gramineproject/gramine-tdx#31

Open

dimakuv commented Jul 10, 2024

View reviewed changes

dimakuv commented Jul 12, 2024

View reviewed changes

dimakuv force-pushed the dimakuv/add-candle-rust-example branch from 18d0dbb to eb802a4 Compare July 12, 2024 09:22

dimakuv commented Jul 12, 2024

View reviewed changes

dimakuv force-pushed the dimakuv/add-candle-rust-example branch from eb802a4 to e52efcd Compare July 12, 2024 09:23

dimakuv marked this pull request as ready for review July 12, 2024 10:08

kailun-qin requested changes Jul 16, 2024

View reviewed changes

mkow requested changes Jul 17, 2024

View reviewed changes

mkow reviewed Jul 21, 2024

View reviewed changes

dimakuv force-pushed the dimakuv/add-cache-sysfs-shared-cpu-list branch from 6639193 to 7e44993 Compare July 24, 2024 06:36

Base automatically changed from dimakuv/add-cache-sysfs-shared-cpu-list to master July 24, 2024 14:13

dimakuv mentioned this pull request Jul 26, 2024

Add Candle ML framework example gramineproject/examples#104

Open

dimakuv closed this Jul 26, 2024

dimakuv deleted the dimakuv/add-candle-rust-example branch July 26, 2024 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI-Examples] Add Candle ML framework example #1938

[CI-Examples] Add Candle ML framework example #1938

dimakuv commented Jul 10, 2024 •

edited by mkow

Loading

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

dimakuv left a comment

kailun-qin left a comment

mkow left a comment

monavij commented Jul 21, 2024

mkow left a comment

dimakuv commented Jul 26, 2024

[CI-Examples] Add Candle ML framework example #1938

[CI-Examples] Add Candle ML framework example #1938

Conversation

dimakuv commented Jul 10, 2024 • edited by mkow Loading

Description of the changes

How to test this PR?

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

monavij commented Jul 21, 2024

mkow left a comment

Choose a reason for hiding this comment

dimakuv commented Jul 26, 2024

dimakuv commented Jul 10, 2024 •

edited by mkow

Loading