phi3.5-vision fails on CPU #1146

suyash-narain · 2024-12-13T23:26:00Z

Hi,

I am using a linux aarch64 device using ORT and onnxruntime-genai v0.5.2

On executing phi3.5-vision model on CPU following the steps mentioned: https://onnxruntime.ai/docs/genai/tutorials/phi3-v.html#run-on-cpu

The program gets killed with oom-error. My device has 16GB memory. I can easily execute phi3.5-mini models on my device, but phi3.5-vision is failing due to oom-kill error.

my error log is as follows:

python3 phi3-v.py -m /tmp/cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/ -p cpu
Loading model...
Model loaded
Image Path (comma separated; leave empty if no image): car.jpg
Using image: car.jpg
Prompt: describe image
Processing images and prompt...
Generating response...
Killed

are there any specific formats of image the model takes in?

I have faced this 'killed' issue with ORT before. With ORT, i have to set the flag enable_cpu_mem_arena to False

How do i execute the same using the provided python script https://github.com/microsoft/onnxruntime-genai/blob/rel-0.5.2/examples/python/phi3v.py

Do ort-genai also have such flags while executing generator models?

The text was updated successfully, but these errors were encountered:

kunal-vaishnavi · 2024-12-14T00:01:00Z

are there any specific formats of image the model takes in?

The images can be of any format. Here's how images are loaded.

onnxruntime-genai/src/models/prompt_image_processor.cpp

Lines 107 to 116 in 7735e10

    
           std::unique_ptr<Images> LoadImages(const std::span<const char* const>& image_paths) { 
        
             for (const char* image_path : image_paths) { 
        
               if (!fs::path(image_path).exists()) { 
        
                 throw std::runtime_error("Image path does not exist: " + std::string(image_path)); 
        
               } 
        
             } 
        
             auto [images, num_images] = ort_extensions::LoadRawData<const char* const*, ort_extensions::ImageRawData>( 
        
                 image_paths.data(), image_paths.data() + image_paths.size()); 
        
             return std::make_unique<Images>(std::move(images), num_images); 
        
           }

The LoadRawData method is defined here and can accept any format.

I have faced this 'killed' issue with ORT before. With ORT, i have to set the flag enable_cpu_mem_arena to False

How do i execute the same using the provided python script https://github.com/microsoft/onnxruntime-genai/blob/rel-0.5.2/examples/python/phi3v.py

Do ort-genai also have such flags while executing generator models?

You can find more information about that here.

suyash-narain · 2024-12-14T00:30:07Z

@kunal-vaishnavi

i add the options as below:

            "session_options": {
                "log_id": "onnxruntime-genai",
                "provider_options": [],
                "enable_cpu_mem_arena": "False",
            },

but i get the error:

RuntimeError: Error encountered while parsing 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/genai_config.json' JSON Error: Unknown value: enable_cpu_mem_arena

is this correct ?

kunal-vaishnavi · 2024-12-14T01:09:46Z

Can you try false instead of "False"? I will update the documentation in the linked answer.

suyash-narain · 2024-12-16T19:13:48Z

Hi @kunal-vaishnavi,

I tried with false instead of False, and i got the same error. It doesn't seem to recognise "enable_cpu_mem_arena"

genai_config.json:


"model": {
        "bos_token_id": 1,
        "context_length": 131072,
        "decoder": {
            "session_options": {
                "log_id": "onnxruntime-genai",
                "provider_options": [],
                "enable_cpu_mem_arena": "false",
            },

error:

$python3 phi3-v.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/ -p cpu
Loading model...
Traceback (most recent call last):
  File "/home/root/phi3_vision/phi3-v.py", line 141, in <module>
    run(args)
  File "/home/root/phi3_vision/phi3-v.py", line 32, in run
    config = og.Config(args.model_path)
RuntimeError: Error encountered while parsing 'cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4//genai_config.json' JSON Error: Unknown value: enable_cpu_mem_arena at line 9 index 48

kunal-vaishnavi · 2024-12-16T19:21:26Z

It has to be without the quotes around the value.

"enable_cpu_mem_arena": false

The above linked PR should help make these JSON mismatch errors clearer in the future.

suyash-narain · 2024-12-16T20:11:28Z

@kunal-vaishnavi
thanks, i could add the flag to my genai_config.json, but i am still getting oom kill error whenever i try to execute this model.
i don't get this issue when i execute phi3.5-mini. But on executing the phi3.5v tutorial steps, the oom flag kicks in and kills the process at the prompt generation step.

Do you have any suggestions to overcome this?

kunal-vaishnavi · 2024-12-18T01:02:44Z

You can measure memory usage in the example script with memory_profiler or with nvidia-smi.

onnxruntime-genai/benchmark/python/benchmark_e2e.py

Lines 40 to 56 in d129274

    
           # Monitor the GPU memory usage 
        
           def monitor_gpu_memory(): 
        
               global peak_gpu_memory 
        
               while not stop_monitoring: 
        
                   result = subprocess.run(['nvidia-smi', '--query-gpu=memory.used', '--format=csv,noheader,nounits'], capture_output=True, text=True) 
        
                   memory_usage = result.stdout.splitlines() 
        
                   if len(memory_usage) >= 1: 
        
                       gpu_memory = [float(line) for line in memory_usage] 
        
                       current_peak = round(max(gpu_memory) / 1024, 2) 
        
                       with peak_memory_lock: 
        
                           peak_gpu_memory = max(current_peak, peak_gpu_memory) 
        
                   else: 
        
                       print("No GPU Memory Info Found") 
        
                   time.sleep(0.1)

This will tell you where the error occurs in your inference.

You can try turning on logging within ONNX Runtime GenAI.

og.set_log_options(enabled=True, model_input_values=True, model_output_values=True, ansi_tags=True)

This will tell you which stage within ONNX Runtime GenAI causes the error.

Since you don't get this issue with Phi-3.5 mini, you can isolate the memory usage of just the text decoder by running Phi-3.5 mini with the above profiling and logging steps. Then the remaining memory usage that you see when running Phi-3.5 vision will be coming from the vision and embedding ONNX models.

If the memory usage is significantly more than running with PyTorch, then there may be an issue that needs to be investigated. If the memory usage is close, you can try resizing the image so that the model doesn't run out-of-memory or use a machine with more RAM.

This changes the JSON parsing to use a std::variant so there just a single OnValue handler vs OnString/OnNumber/OnBool/OnNull. Previously a mismatched type would say `JSON Error: Unknown value: name at line 3 index 19` or it would say `JSON Error: Unknown value: name` if the name was known but the type of its value was wrong (example: #1146). Now it'll give a much better error message, showing first the full path of the field being parsed, and then saying exactly how the types mismatch: `JSON Error: model:type - Expected a number but saw a string at line 3 index 19`

kunal-vaishnavi mentioned this issue Dec 16, 2024

Better JSON type mismatch errors #1147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phi3.5-vision fails on CPU #1146

phi3.5-vision fails on CPU #1146

suyash-narain commented Dec 13, 2024

kunal-vaishnavi commented Dec 14, 2024

suyash-narain commented Dec 14, 2024

kunal-vaishnavi commented Dec 14, 2024

suyash-narain commented Dec 16, 2024

kunal-vaishnavi commented Dec 16, 2024

suyash-narain commented Dec 16, 2024

kunal-vaishnavi commented Dec 18, 2024

phi3.5-vision fails on CPU #1146

phi3.5-vision fails on CPU #1146

Comments

suyash-narain commented Dec 13, 2024

kunal-vaishnavi commented Dec 14, 2024

suyash-narain commented Dec 14, 2024

kunal-vaishnavi commented Dec 14, 2024

suyash-narain commented Dec 16, 2024

kunal-vaishnavi commented Dec 16, 2024

suyash-narain commented Dec 16, 2024

kunal-vaishnavi commented Dec 18, 2024