Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phi3.5-vision fails on CPU #1146

Open
suyash-narain opened this issue Dec 13, 2024 · 7 comments
Open

phi3.5-vision fails on CPU #1146

suyash-narain opened this issue Dec 13, 2024 · 7 comments

Comments

@suyash-narain
Copy link

Hi,

I am using a linux aarch64 device using ORT and onnxruntime-genai v0.5.2

On executing phi3.5-vision model on CPU following the steps mentioned: https://onnxruntime.ai/docs/genai/tutorials/phi3-v.html#run-on-cpu

The program gets killed with oom-error. My device has 16GB memory. I can easily execute phi3.5-mini models on my device, but phi3.5-vision is failing due to oom-kill error.

my error log is as follows:

python3 phi3-v.py -m /tmp/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/ -p cpu
Loading model...
Model loaded
Image Path (comma separated; leave empty if no image): car.jpg
Using image: car.jpg
Prompt: describe image
Processing images and prompt...
Generating response...
Killed

are there any specific formats of image the model takes in?

I have faced this 'killed' issue with ORT before. With ORT, i have to set the flag enable_cpu_mem_arena to False

How do i execute the same using the provided python script https://github.com/microsoft/onnxruntime-genai/blob/rel-0.5.2/examples/python/phi3v.py

Do ort-genai also have such flags while executing generator models?

@kunal-vaishnavi
Copy link
Contributor

are there any specific formats of image the model takes in?

The images can be of any format. Here's how images are loaded.

std::unique_ptr<Images> LoadImages(const std::span<const char* const>& image_paths) {
for (const char* image_path : image_paths) {
if (!fs::path(image_path).exists()) {
throw std::runtime_error("Image path does not exist: " + std::string(image_path));
}
}
auto [images, num_images] = ort_extensions::LoadRawData<const char* const*, ort_extensions::ImageRawData>(
image_paths.data(), image_paths.data() + image_paths.size());
return std::make_unique<Images>(std::move(images), num_images);
}

The LoadRawData method is defined here and can accept any format.

I have faced this 'killed' issue with ORT before. With ORT, i have to set the flag enable_cpu_mem_arena to False

How do i execute the same using the provided python script https://github.com/microsoft/onnxruntime-genai/blob/rel-0.5.2/examples/python/phi3v.py

Do ort-genai also have such flags while executing generator models?

You can find more information about that here.

@suyash-narain
Copy link
Author

@kunal-vaishnavi

i add the options as below:

            "session_options": {
                "log_id": "onnxruntime-genai",
                "provider_options": [],
                "enable_cpu_mem_arena": "False",
            },

but i get the error:

RuntimeError: Error encountered while parsing 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json' JSON Error: Unknown value: enable_cpu_mem_arena

is this correct ?

@kunal-vaishnavi
Copy link
Contributor

Can you try false instead of "False"? I will update the documentation in the linked answer.

@suyash-narain
Copy link
Author

Hi @kunal-vaishnavi,

I tried with false instead of False, and i got the same error. It doesn't seem to recognise "enable_cpu_mem_arena"

genai_config.json:


"model": {
        "bos_token_id": 1,
        "context_length": 131072,
        "decoder": {
            "session_options": {
                "log_id": "onnxruntime-genai",
                "provider_options": [],
                "enable_cpu_mem_arena": "false",
            },

error:

$python3 phi3-v.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/ -p cpu
Loading model...
Traceback (most recent call last):
  File "/home/root/phi3_vision/phi3-v.py", line 141, in <module>
    run(args)
  File "/home/root/phi3_vision/phi3-v.py", line 32, in run
    config = og.Config(args.model_path)
RuntimeError: Error encountered while parsing 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4//genai_config.json' JSON Error: Unknown value: enable_cpu_mem_arena at line 9 index 48

@kunal-vaishnavi
Copy link
Contributor

It has to be without the quotes around the value.

"enable_cpu_mem_arena": false

The above linked PR should help make these JSON mismatch errors clearer in the future.

@suyash-narain
Copy link
Author

@kunal-vaishnavi
thanks, i could add the flag to my genai_config.json, but i am still getting oom kill error whenever i try to execute this model.
i don't get this issue when i execute phi3.5-mini. But on executing the phi3.5v tutorial steps, the oom flag kicks in and kills the process at the prompt generation step.

Do you have any suggestions to overcome this?

@kunal-vaishnavi
Copy link
Contributor

  1. You can measure memory usage in the example script with memory_profiler or with nvidia-smi.

# Monitor the GPU memory usage
def monitor_gpu_memory():
global peak_gpu_memory
while not stop_monitoring:
result = subprocess.run(['nvidia-smi', '--query-gpu=memory.used', '--format=csv,noheader,nounits'], capture_output=True, text=True)
memory_usage = result.stdout.splitlines()
if len(memory_usage) >= 1:
gpu_memory = [float(line) for line in memory_usage]
current_peak = round(max(gpu_memory) / 1024, 2)
with peak_memory_lock:
peak_gpu_memory = max(current_peak, peak_gpu_memory)
else:
print("No GPU Memory Info Found")
time.sleep(0.1)

This will tell you where the error occurs in your inference.

  1. You can try turning on logging within ONNX Runtime GenAI.
og.set_log_options(enabled=True, model_input_values=True, model_output_values=True, ansi_tags=True)

This will tell you which stage within ONNX Runtime GenAI causes the error.

  1. Since you don't get this issue with Phi-3.5 mini, you can isolate the memory usage of just the text decoder by running Phi-3.5 mini with the above profiling and logging steps. Then the remaining memory usage that you see when running Phi-3.5 vision will be coming from the vision and embedding ONNX models.

If the memory usage is significantly more than running with PyTorch, then there may be an issue that needs to be investigated. If the memory usage is close, you can try resizing the image so that the model doesn't run out-of-memory or use a machine with more RAM.

RyanUnderhill added a commit that referenced this issue Dec 18, 2024
This changes the JSON parsing to use a std::variant so there just a
single OnValue handler vs OnString/OnNumber/OnBool/OnNull.

Previously a mismatched type would say

`JSON Error: Unknown value: name at line 3 index 19`

or it would say

`JSON Error: Unknown value: name`

if the name was known but the type of its value was wrong (example:
#1146).

Now it'll give a much better error message, showing first the full path
of the field being parsed, and then saying exactly how the types
mismatch:

`JSON Error: model:type - Expected a number but saw a string at line 3
index 19`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants