How to choose which Vulkan device to run on?

Hi I'm on macOS 15.4 and I build sd.cpp with Vulkan support as I have a MBP 16 with 5500M integrated GPU and a 6800XT external GPU so have 2 vulkan devices when sd.cpp runs. I noticed when running sd.cpp picks the 5500M and ignores the 6800XT. How to I select which Vulkan device the model runs on?

<img width="279" alt="Image" src="https://github.com/user-attachments/assets/47741e21-3cdb-4e93-9164-2989f7c28ef1" />

```
cmake -B build -DGGML_METAL=OFF -DSD_VULKAN=ON \
-DVulkan_INCLUDE_DIR=/usr/local/Cellar/molten-vk/1.2.11/include \
-DVulkan_LIBRARY=/usr/local/Cellar/molten-vk/1.2.11/lib/libMoltenVK.dylib \
-DOpenMP_ROOT=$(brew --prefix)/opt/libomp \
-DVulkan_GLSLC_EXECUTABLE=$(brew --prefix)/opt/shaderc/bin/glslc \
-DVulkan_GLSLANG_VALIDATOR_EXECUTABLE=$(brew --prefix)/opt/glslang/bin/glslangValidator \
-DOpenMP_C_FLAGS=-fopenmp=lomp \
-DOpenMP_CXX_FLAGS=-fopenmp=lomp \
-DOpenMP_C_LIB_NAMES="libomp" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_libomp_LIBRARY="$(brew --prefix)/opt/libomp/lib/libomp.dylib" \
-DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include"

cmake --build build --config Release -j 8
```

```
15:32:58 ~/Dev/stable-diffusion.cpp master                                                                                                                      
./build/bin/sd -m ../sd-models/sd-v1-4.ckpt -p "a cat" -v 
Option: 
    n_threads:         8
    mode:              txt2img
    model_path:        ../sd-models/sd-v1-4.ckpt
    wtype:             unspecified
    clip_l_path:       
    clip_g_path:       
    t5xxl_path:        
    diffusion_model_path:   
    vae_path:          
    taesd_path:        
    esrgan_path:       
    controlnet_path:   
    embeddings_path:   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       output.png
    init_img:          
    mask_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:false
    strength(control): 0.90
    prompt:            a cat
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         7.00
    slg_scale:         0.00
    guidance:          3.50
    eta:               0.00
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler_a
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:174  - Using Vulkan backend
ggml_vulkan: WARNING: Instance extension VK_KHR_portability_enumeration not found.
[mvk-info] MoltenVK version 1.2.12, supporting Vulkan version 1.2.309.
	The following 115 Vulkan extensions are supported:
	VK_KHR_16bit_storage v1
	VK_KHR_8bit_storage v1
	VK_KHR_bind_memory2 v1
	VK_KHR_buffer_device_address v1
	VK_KHR_calibrated_timestamps v1
	VK_KHR_copy_commands2 v1
	VK_KHR_create_renderpass2 v1
	VK_KHR_dedicated_allocation v3
	VK_KHR_deferred_host_operations v4
	VK_KHR_depth_stencil_resolve v1
	VK_KHR_descriptor_update_template v1
	VK_KHR_device_group v4
	VK_KHR_device_group_creation v1
	VK_KHR_driver_properties v1
	VK_KHR_dynamic_rendering v1
	VK_KHR_external_fence v1
	VK_KHR_external_fence_capabilities v1
	VK_KHR_external_memory v1
	VK_KHR_external_memory_capabilities v1
	VK_KHR_external_semaphore v1
	VK_KHR_external_semaphore_capabilities v1
	VK_KHR_fragment_shader_barycentric v1
	VK_KHR_format_feature_flags2 v2
	VK_KHR_get_memory_requirements2 v1
	VK_KHR_get_physical_device_properties2 v2
	VK_KHR_get_surface_capabilities2 v1
	VK_KHR_imageless_framebuffer v1
	VK_KHR_image_format_list v1
	VK_KHR_incremental_present v2
	VK_KHR_maintenance1 v2
	VK_KHR_maintenance2 v1
	VK_KHR_maintenance3 v1
	VK_KHR_map_memory2 v1
	VK_KHR_multiview v1
	VK_KHR_portability_subset v1
	VK_KHR_push_descriptor v2
	VK_KHR_relaxed_block_layout v1
	VK_KHR_sampler_mirror_clamp_to_edge v3
	VK_KHR_sampler_ycbcr_conversion v14
	VK_KHR_separate_depth_stencil_layouts v1
	VK_KHR_shader_draw_parameters v1
	VK_KHR_shader_float_controls v4
	VK_KHR_shader_float16_int8 v1
	VK_KHR_shader_integer_dot_product v1
	VK_KHR_shader_non_semantic_info v1
	VK_KHR_shader_subgroup_extended_types v1
	VK_KHR_shader_terminate_invocation v1
	VK_KHR_spirv_1_4 v1
	VK_KHR_storage_buffer_storage_class v1
	VK_KHR_surface v25
	VK_KHR_swapchain v70
	VK_KHR_swapchain_mutable_format v1
	VK_KHR_synchronization2 v1
	VK_KHR_timeline_semaphore v2
	VK_KHR_uniform_buffer_standard_layout v1
	VK_KHR_variable_pointers v1
	VK_KHR_vertex_attribute_divisor v1
	VK_KHR_zero_initialize_workgroup_memory v1
	VK_EXT_4444_formats v1
	VK_EXT_buffer_device_address v2
	VK_EXT_calibrated_timestamps v2
	VK_EXT_debug_marker v4
	VK_EXT_debug_report v10
	VK_EXT_debug_utils v2
	VK_EXT_descriptor_indexing v2
	VK_EXT_depth_clip_control v1
	VK_EXT_extended_dynamic_state v1
	VK_EXT_extended_dynamic_state2 v1
	VK_EXT_extended_dynamic_state3 v2
	VK_EXT_external_memory_host v1
	VK_EXT_external_memory_metal v1
	VK_EXT_fragment_shader_interlock v1
	VK_EXT_hdr_metadata v3
	VK_EXT_headless_surface v1
	VK_EXT_host_image_copy v1
	VK_EXT_host_query_reset v1
	VK_EXT_image_2d_view_of_3d v1
	VK_EXT_image_robustness v1
	VK_EXT_inline_uniform_block v1
	VK_EXT_layer_settings v2
	VK_EXT_memory_budget v1
	VK_EXT_metal_objects v2
	VK_EXT_metal_surface v1
	VK_EXT_pipeline_creation_cache_control v3
	VK_EXT_pipeline_creation_feedback v1
	VK_EXT_post_depth_coverage v1
	VK_EXT_private_data v1
	VK_EXT_robustness2 v1
	VK_EXT_sample_locations v1
	VK_EXT_scalar_block_layout v1
	VK_EXT_separate_stencil_usage v1
	VK_EXT_shader_atomic_float v1
	VK_EXT_shader_demote_to_helper_invocation v1
	VK_EXT_shader_stencil_export v1
	VK_EXT_shader_subgroup_ballot v1
	VK_EXT_shader_subgroup_vote v1
	VK_EXT_shader_viewport_index_layer v1
	VK_EXT_subgroup_size_control v2
	VK_EXT_surface_maintenance1 v1
	VK_EXT_swapchain_colorspace v5
	VK_EXT_swapchain_maintenance1 v1
	VK_EXT_texel_buffer_alignment v1
	VK_EXT_texture_compression_astc_hdr v1
	VK_EXT_tooling_info v1
	VK_EXT_vertex_attribute_divisor v3
	VK_AMD_gpu_shader_half_float v2
	VK_AMD_negative_viewport_height v1
	VK_AMD_shader_image_load_store_lod v1
	VK_AMD_shader_trinary_minmax v1
	VK_IMG_format_pvrtc v1
	VK_INTEL_shader_integer_functions2 v1
	VK_GOOGLE_display_timing v1
	VK_MVK_macos_surface v3
	VK_MVK_moltenvk v37
	VK_NV_fragment_shader_barycentric v1
[mvk-info] GPU device:
	model: AMD Radeon RX 6800 XT
	type: Discrete
	vendorID: 0x1002
	deviceID: 0x73bf
	pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
	GPU memory available: 16368 MB
	GPU memory used: 0 MB
	Metal Shading Language 3.2
	supports the following GPU Features:
		GPU Family Metal 3
		GPU Family Mac 2
		Read-Write Texture Tier 2
[mvk-info] GPU device:
	model: AMD Radeon Pro 5500M
	type: Discrete
	vendorID: 0x1002
	deviceID: 0x7340
	pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
	GPU memory available: 8176 MB
	GPU memory used: 0 MB
	Metal Shading Language 3.2
	supports the following GPU Features:
		GPU Family Metal 3
		GPU Family Mac 2
		Read-Write Texture Tier 2
[mvk-info] GPU device:
	model: Intel(R) UHD Graphics 630
	type: Integrated
	vendorID: 0x8086
	deviceID: 0x3e9b
	pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
	GPU memory available: 1536 MB
	GPU memory used: 8 MB
	Metal Shading Language 3.2
	supports the following GPU Features:
		GPU Family Metal 3
		GPU Family Mac 2
		Read-Write Texture Tier 1
[mvk-info] Created VkInstance for Vulkan version 1.2.309, as requested by app, with the following 0 Vulkan extensions enabled:
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
[mvk-info] Vulkan semaphores using MTLEvent.
[mvk-info] Descriptor sets binding resources using Metal3 argument buffers.
[mvk-info] Created VkDevice to run on GPU AMD Radeon RX 6800 XT with the following 3 Vulkan extensions enabled:
	VK_KHR_16bit_storage v1
	VK_KHR_shader_float16_int8 v1
	VK_EXT_subgroup_size_control v2
[mvk-info] Vulkan semaphores using MTLEvent.
[mvk-info] Descriptor sets binding resources using Metal3 argument buffers.
[mvk-info] Created VkDevice to run on GPU AMD Radeon Pro 5500M with the following 3 Vulkan extensions enabled:
	VK_KHR_16bit_storage v1
	VK_KHR_shader_float16_int8 v1
	VK_EXT_subgroup_size_control v2
[INFO ] stable-diffusion.cpp:197  - loading model from '../sd-models/sd-v1-4.ckpt'
[INFO ] model.cpp:911  - load ../sd-models/sd-v1-4.ckpt using checkpoint format
[DEBUG] model.cpp:1445 - init from '../sd-models/sd-v1-4.ckpt'
ZIP 0, name = archive/data.pkl, dir = archive/ 
[INFO ] stable-diffusion.cpp:244  - Version: SD 1.x 
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f32
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f32
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1178 - clip params backend buffer size =  469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1178 - unet params backend buffer size =  2155.33 MB(VRAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1178 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:419  - loading weights
[DEBUG] model.cpp:1727 - loading tensors from ../sd-models/sd-v1-4.ckpt
  |==================================================| 1131/1131 - 1000.00it/s
[INFO ] stable-diffusion.cpp:518  - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:522  - loading model from '../sd-models/sd-v1-4.ckpt' completed, taking 11.57s
[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:600  - finished loaded file
[DEBUG] stable-diffusion.cpp:1548 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1241 - prompt after extract and remove lora: "a cat"
[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:357  - parse 'a cat' to [['a cat', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] ggml_extend.hpp:1129 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] conditioner.hpp:485  - computing condition graph completed, taking 639 ms
[DEBUG] conditioner.hpp:357  - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] ggml_extend.hpp:1129 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] conditioner.hpp:485  - computing condition graph completed, taking 41 ms
[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 682 ms
[INFO ] stable-diffusion.cpp:1402 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1439 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:808  - Sample
[DEBUG] ggml_extend.hpp:1129 - unet compute buffer size: 559.90 MB(VRAM)
  |==================================================| 20/20 - 2.04s/it
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 42.03s
[INFO ] stable-diffusion.cpp:1486 - generating 1 latent images completed, taking 42.03s
[INFO ] stable-diffusion.cpp:1489 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1129 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1090 - computing vae [mode: DECODE] graph completed, taking 5.21s
[INFO ] stable-diffusion.cpp:1499 - latent 1 decoded, taking 5.21s
[INFO ] stable-diffusion.cpp:1503 - decode_first_stage completed, taking 5.21s
[INFO ] stable-diffusion.cpp:1628 - txt2img completed in 47.92s
save result PNG image to 'output.png'
```

![Image](https://github.com/user-attachments/assets/d9b4c39c-84e9-4754-a48b-a59b5186b932)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to choose which Vulkan device to run on? #650

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to choose which Vulkan device to run on? #650

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions