Download model from Object Storage #69

nitin302 · 2025-02-06T18:04:34Z

Need a solution to download a model from AWS storage, Azure Blob Storage or MinIO.

Creating this issue as requested here - #67 (comment)

ApostaC · 2025-02-06T18:30:26Z

Thanks for helping us organize the feature requests! Will work on this soon!

xqe2011 · 2025-02-07T15:45:58Z

Looking forward to this feature too. I am using RunAI Model Streamer to load models directly from s3 now.

noa-neria · 2025-02-09T12:48:55Z

Hi from the RunAI team,

Happy to confirm that you can load models directly from object storage in the production stack, by adding the necessary flags and credentials in your configuration file.

Using the following configuration file, we deployed vLLM with RunAI model streamer to read the model from S3.
Any S3 compatible such as GCS, Minio etc. is also supported with additional flags, as explained here

servingEngineSpec:
  modelSpec:
  - name: "llama3"
    repository: "vllm/vllm-openai"
    tag: "v0.7.1"
    modelURL: "s3://core-llm/Llama-3-8b/"
    replicaCount: 1
    env:
    - name: AWS_ACCESS_KEY_ID
      value: "Your_key_here"
    - name: AWS_SECRET_ACCESS_KEY
      value: "Your_secret_here"
    - name: RUNAI_STREAMER_MEMORY_LIMIT
      value: "8388608000"
 
    pvcStorage: "5Gi"

    requestCPU: 1
    requestMemory: "30Gi"
    requestGPU: 1

    vllmConfig:
      extraArgs: ["--load-format", "runai_streamer"]
    hf_token: Your_token_here

RUNAI_STREAMER_MEMORY_LIMIT is optional memory limit in bytes. If not specified, the RunAI streamer will allocate CPU memory in the total size of the model weights.

The RunAI streamer is an open source project integrated into vLLM.

The streamer provides direct fast streaming of model weights from Safetensors files (either from file system or object storage), saturating the storage bandwidth with parallel reads. Benchmarks can be found here

xqe2011 · 2025-02-09T14:35:14Z

Hi from the RunAI team,

Happy to confirm that you can load models directly from object storage in the production stack, by adding the necessary flags and credentials in your configuration file.

Using the following configuration file, we deployed vLLM with RunAI model streamer to read the model from S3. Any S3 compatible such as GCS, Minio etc. is also supported with additional flags, as explained here
servingEngineSpec:
  modelSpec:
  - name: "llama3"
    repository: "vllm/vllm-openai"
    tag: "v0.7.1"
    modelURL: "s3://core-llm/Llama-3-8b/"
    replicaCount: 1
    env:
    - name: AWS_ACCESS_KEY_ID
      value: "Your_key_here"
    - name: AWS_SECRET_ACCESS_KEY
      value: "Your_secret_here"
    - name: RUNAI_STREAMER_MEMORY_LIMIT
      value: "8388608000"
 
    pvcStorage: "5Gi"

    requestCPU: 1
    requestMemory: "30Gi"
    requestGPU: 1

    vllmConfig:
      extraArgs: ["--load-format", "runai_streamer"]
    hf_token: Your_token_here
RUNAI_STREAMER_MEMORY_LIMIT is optional memory limit in bytes. If not specified, the RunAI streamer will allocate CPU memory in the total size of the model weights.

The RunAI streamer is an open source project integrated into vLLM.

The streamer provides direct fast streaming of model weights from Safetensors files (either from file system or object storage), saturating the storage bandwidth with parallel reads. Benchmarks can be found here

@noa-neria Thank you. I have tested this method, but the RunAI Streamer has some limitations when loading from S3. For example, it can't load models that need --trust-remote-code (DeepSeek-R1-Qwen-32B). Additionally, This method bypasses the Linux file cache, which takes more time to transfer from the remote. We fall back to vllm charts finally.

noa-neria · 2025-02-09T17:26:28Z

@xqe2011 we appreciate your feedback!

The --trust-remote-code is fully supported (not related to the streamer) and can be passed in extraArgs.

The ModelScope support was not yet added to the Runai Loader, and we are working to fix it.
However, It is not related to downloading from object storage. Similar to the default loader in vLLM, the Runai Loader should download from the repository if the path is not local or S3, but currently we only download from HuggingFace.
That is not a problem if all the model files are already in the object storage.

You have also mentioned the performance issue when the model is distributed on several devices and nodes.
In that case each vLLM process is reading the full model from the object storage, which is not efficient.
We are now working to support loading in sharded mode, which will be most efficient as the entire model will be loaded only once.

xqe2011 · 2025-02-11T10:14:03Z

@xqe2011 we appreciate your feedback!

The --trust-remote-code is fully supported (not related to the streamer) and can be passed in extraArgs.

The ModelScope support was not yet added to the Runai Loader, and we are working to fix it. However, It is not related to downloading from object storage. Similar to the default loader in vLLM, the Runai Loader should download from the repository if the path is not local or S3, but currently we only download from HuggingFace. That is not a problem if all the model files are already in the object storage.

You have also mentioned the performance issue when the model is distributed on several devices and nodes. In that case each vLLM process is reading the full model from the object storage, which is not efficient. We are now working to support loading in sharded mode, which will be most efficient as the entire model will be loaded only once.

@noa-neria Well, I know why... We are using the fs backend of MinIO, so the s3:// url must ends with /. If not, the ListBucketV2 API will not return files.

noa-neria · 2025-02-11T18:12:32Z

@xqe2011 we have an issue configuring for S3 compatible such as Minio. It will be fixed soon, and a workaround is to pass the endpoint url via two environment variables:
AWS_ENDPOINT_URL=endpoint_url RUNAI_STREAMER_S3_ENDPOINT=endpoint_url

url that ends with / is supported

gaocegege · 2025-02-12T08:06:16Z

That’s awesome! It not only supports object storage but also speeds things up. Maybe we should add some info about it in the tutorial.

ApostaC added the feature request New feature or request label Feb 6, 2025

nitin302 mentioned this issue Feb 20, 2025

Tutorial for AWS EKS #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download model from Object Storage #69

Download model from Object Storage #69

nitin302 commented Feb 6, 2025

ApostaC commented Feb 6, 2025

xqe2011 commented Feb 7, 2025

noa-neria commented Feb 9, 2025 •

edited

Loading

xqe2011 commented Feb 9, 2025 •

edited

Loading

noa-neria commented Feb 9, 2025

xqe2011 commented Feb 11, 2025

noa-neria commented Feb 11, 2025

gaocegege commented Feb 12, 2025

Download model from Object Storage #69

Download model from Object Storage #69

Comments

nitin302 commented Feb 6, 2025

ApostaC commented Feb 6, 2025

xqe2011 commented Feb 7, 2025

noa-neria commented Feb 9, 2025 • edited Loading

xqe2011 commented Feb 9, 2025 • edited Loading

noa-neria commented Feb 9, 2025

xqe2011 commented Feb 11, 2025

noa-neria commented Feb 11, 2025

gaocegege commented Feb 12, 2025

noa-neria commented Feb 9, 2025 •

edited

Loading

xqe2011 commented Feb 9, 2025 •

edited

Loading