Skip to content

How to choose which Vulkan device to run on? #650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thecannabisapp opened this issue Apr 6, 2025 · 2 comments
Closed

How to choose which Vulkan device to run on? #650

thecannabisapp opened this issue Apr 6, 2025 · 2 comments

Comments

@thecannabisapp
Copy link

thecannabisapp commented Apr 6, 2025

Hi I'm on macOS 15.4 and I build sd.cpp with Vulkan support as I have a MBP 16 with 5500M integrated GPU and a 6800XT external GPU so have 2 vulkan devices when sd.cpp runs. I noticed when running sd.cpp picks the 5500M and ignores the 6800XT. How to I select which Vulkan device the model runs on?

Image
cmake -B build -DGGML_METAL=OFF -DSD_VULKAN=ON \
-DVulkan_INCLUDE_DIR=/usr/local/Cellar/molten-vk/1.2.11/include \
-DVulkan_LIBRARY=/usr/local/Cellar/molten-vk/1.2.11/lib/libMoltenVK.dylib \
-DOpenMP_ROOT=$(brew --prefix)/opt/libomp \
-DVulkan_GLSLC_EXECUTABLE=$(brew --prefix)/opt/shaderc/bin/glslc \
-DVulkan_GLSLANG_VALIDATOR_EXECUTABLE=$(brew --prefix)/opt/glslang/bin/glslangValidator \
-DOpenMP_C_FLAGS=-fopenmp=lomp \
-DOpenMP_CXX_FLAGS=-fopenmp=lomp \
-DOpenMP_C_LIB_NAMES="libomp" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_libomp_LIBRARY="$(brew --prefix)/opt/libomp/lib/libomp.dylib" \
-DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include"

cmake --build build --config Release -j 8
15:32:58 ~/Dev/stable-diffusion.cpp master                                                                                                                      
./build/bin/sd -m ../sd-models/sd-v1-4.ckpt -p "a cat" -v 
Option: 
    n_threads:         8
    mode:              txt2img
    model_path:        ../sd-models/sd-v1-4.ckpt
    wtype:             unspecified
    clip_l_path:       
    clip_g_path:       
    t5xxl_path:        
    diffusion_model_path:   
    vae_path:          
    taesd_path:        
    esrgan_path:       
    controlnet_path:   
    embeddings_path:   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       output.png
    init_img:          
    mask_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    diffusion flash attention:false
    strength(control): 0.90
    prompt:            a cat
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         7.00
    slg_scale:         0.00
    guidance:          3.50
    eta:               0.00
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler_a
    schedule:          default
    sample_steps:      20
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:174  - Using Vulkan backend
ggml_vulkan: WARNING: Instance extension VK_KHR_portability_enumeration not found.
[mvk-info] MoltenVK version 1.2.12, supporting Vulkan version 1.2.309.
	The following 115 Vulkan extensions are supported:
	VK_KHR_16bit_storage v1
	VK_KHR_8bit_storage v1
	VK_KHR_bind_memory2 v1
	VK_KHR_buffer_device_address v1
	VK_KHR_calibrated_timestamps v1
	VK_KHR_copy_commands2 v1
	VK_KHR_create_renderpass2 v1
	VK_KHR_dedicated_allocation v3
	VK_KHR_deferred_host_operations v4
	VK_KHR_depth_stencil_resolve v1
	VK_KHR_descriptor_update_template v1
	VK_KHR_device_group v4
	VK_KHR_device_group_creation v1
	VK_KHR_driver_properties v1
	VK_KHR_dynamic_rendering v1
	VK_KHR_external_fence v1
	VK_KHR_external_fence_capabilities v1
	VK_KHR_external_memory v1
	VK_KHR_external_memory_capabilities v1
	VK_KHR_external_semaphore v1
	VK_KHR_external_semaphore_capabilities v1
	VK_KHR_fragment_shader_barycentric v1
	VK_KHR_format_feature_flags2 v2
	VK_KHR_get_memory_requirements2 v1
	VK_KHR_get_physical_device_properties2 v2
	VK_KHR_get_surface_capabilities2 v1
	VK_KHR_imageless_framebuffer v1
	VK_KHR_image_format_list v1
	VK_KHR_incremental_present v2
	VK_KHR_maintenance1 v2
	VK_KHR_maintenance2 v1
	VK_KHR_maintenance3 v1
	VK_KHR_map_memory2 v1
	VK_KHR_multiview v1
	VK_KHR_portability_subset v1
	VK_KHR_push_descriptor v2
	VK_KHR_relaxed_block_layout v1
	VK_KHR_sampler_mirror_clamp_to_edge v3
	VK_KHR_sampler_ycbcr_conversion v14
	VK_KHR_separate_depth_stencil_layouts v1
	VK_KHR_shader_draw_parameters v1
	VK_KHR_shader_float_controls v4
	VK_KHR_shader_float16_int8 v1
	VK_KHR_shader_integer_dot_product v1
	VK_KHR_shader_non_semantic_info v1
	VK_KHR_shader_subgroup_extended_types v1
	VK_KHR_shader_terminate_invocation v1
	VK_KHR_spirv_1_4 v1
	VK_KHR_storage_buffer_storage_class v1
	VK_KHR_surface v25
	VK_KHR_swapchain v70
	VK_KHR_swapchain_mutable_format v1
	VK_KHR_synchronization2 v1
	VK_KHR_timeline_semaphore v2
	VK_KHR_uniform_buffer_standard_layout v1
	VK_KHR_variable_pointers v1
	VK_KHR_vertex_attribute_divisor v1
	VK_KHR_zero_initialize_workgroup_memory v1
	VK_EXT_4444_formats v1
	VK_EXT_buffer_device_address v2
	VK_EXT_calibrated_timestamps v2
	VK_EXT_debug_marker v4
	VK_EXT_debug_report v10
	VK_EXT_debug_utils v2
	VK_EXT_descriptor_indexing v2
	VK_EXT_depth_clip_control v1
	VK_EXT_extended_dynamic_state v1
	VK_EXT_extended_dynamic_state2 v1
	VK_EXT_extended_dynamic_state3 v2
	VK_EXT_external_memory_host v1
	VK_EXT_external_memory_metal v1
	VK_EXT_fragment_shader_interlock v1
	VK_EXT_hdr_metadata v3
	VK_EXT_headless_surface v1
	VK_EXT_host_image_copy v1
	VK_EXT_host_query_reset v1
	VK_EXT_image_2d_view_of_3d v1
	VK_EXT_image_robustness v1
	VK_EXT_inline_uniform_block v1
	VK_EXT_layer_settings v2
	VK_EXT_memory_budget v1
	VK_EXT_metal_objects v2
	VK_EXT_metal_surface v1
	VK_EXT_pipeline_creation_cache_control v3
	VK_EXT_pipeline_creation_feedback v1
	VK_EXT_post_depth_coverage v1
	VK_EXT_private_data v1
	VK_EXT_robustness2 v1
	VK_EXT_sample_locations v1
	VK_EXT_scalar_block_layout v1
	VK_EXT_separate_stencil_usage v1
	VK_EXT_shader_atomic_float v1
	VK_EXT_shader_demote_to_helper_invocation v1
	VK_EXT_shader_stencil_export v1
	VK_EXT_shader_subgroup_ballot v1
	VK_EXT_shader_subgroup_vote v1
	VK_EXT_shader_viewport_index_layer v1
	VK_EXT_subgroup_size_control v2
	VK_EXT_surface_maintenance1 v1
	VK_EXT_swapchain_colorspace v5
	VK_EXT_swapchain_maintenance1 v1
	VK_EXT_texel_buffer_alignment v1
	VK_EXT_texture_compression_astc_hdr v1
	VK_EXT_tooling_info v1
	VK_EXT_vertex_attribute_divisor v3
	VK_AMD_gpu_shader_half_float v2
	VK_AMD_negative_viewport_height v1
	VK_AMD_shader_image_load_store_lod v1
	VK_AMD_shader_trinary_minmax v1
	VK_IMG_format_pvrtc v1
	VK_INTEL_shader_integer_functions2 v1
	VK_GOOGLE_display_timing v1
	VK_MVK_macos_surface v3
	VK_MVK_moltenvk v37
	VK_NV_fragment_shader_barycentric v1
[mvk-info] GPU device:
	model: AMD Radeon RX 6800 XT
	type: Discrete
	vendorID: 0x1002
	deviceID: 0x73bf
	pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
	GPU memory available: 16368 MB
	GPU memory used: 0 MB
	Metal Shading Language 3.2
	supports the following GPU Features:
		GPU Family Metal 3
		GPU Family Mac 2
		Read-Write Texture Tier 2
[mvk-info] GPU device:
	model: AMD Radeon Pro 5500M
	type: Discrete
	vendorID: 0x1002
	deviceID: 0x7340
	pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
	GPU memory available: 8176 MB
	GPU memory used: 0 MB
	Metal Shading Language 3.2
	supports the following GPU Features:
		GPU Family Metal 3
		GPU Family Mac 2
		Read-Write Texture Tier 2
[mvk-info] GPU device:
	model: Intel(R) UHD Graphics 630
	type: Integrated
	vendorID: 0x8086
	deviceID: 0x3e9b
	pipelineCacheUUID: 83510E0F-0F03-0200-0000-000100000000
	GPU memory available: 1536 MB
	GPU memory used: 8 MB
	Metal Shading Language 3.2
	supports the following GPU Features:
		GPU Family Metal 3
		GPU Family Mac 2
		Read-Write Texture Tier 1
[mvk-info] Created VkInstance for Vulkan version 1.2.309, as requested by app, with the following 0 Vulkan extensions enabled:
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon Pro 5500M (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
[mvk-info] Vulkan semaphores using MTLEvent.
[mvk-info] Descriptor sets binding resources using Metal3 argument buffers.
[mvk-info] Created VkDevice to run on GPU AMD Radeon RX 6800 XT with the following 3 Vulkan extensions enabled:
	VK_KHR_16bit_storage v1
	VK_KHR_shader_float16_int8 v1
	VK_EXT_subgroup_size_control v2
[mvk-info] Vulkan semaphores using MTLEvent.
[mvk-info] Descriptor sets binding resources using Metal3 argument buffers.
[mvk-info] Created VkDevice to run on GPU AMD Radeon Pro 5500M with the following 3 Vulkan extensions enabled:
	VK_KHR_16bit_storage v1
	VK_KHR_shader_float16_int8 v1
	VK_EXT_subgroup_size_control v2
[INFO ] stable-diffusion.cpp:197  - loading model from '../sd-models/sd-v1-4.ckpt'
[INFO ] model.cpp:911  - load ../sd-models/sd-v1-4.ckpt using checkpoint format
[DEBUG] model.cpp:1445 - init from '../sd-models/sd-v1-4.ckpt'
ZIP 0, name = archive/data.pkl, dir = archive/ 
[INFO ] stable-diffusion.cpp:244  - Version: SD 1.x 
[INFO ] stable-diffusion.cpp:277  - Weight type:                 f32
[INFO ] stable-diffusion.cpp:278  - Conditioner weight type:     f32
[INFO ] stable-diffusion.cpp:279  - Diffusion model weight type: f32
[INFO ] stable-diffusion.cpp:280  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:282  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1178 - clip params backend buffer size =  469.44 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1178 - unet params backend buffer size =  2155.33 MB(VRAM) (686 tensors)
[DEBUG] ggml_extend.hpp:1178 - vae params backend buffer size =  94.47 MB(VRAM) (140 tensors)
[DEBUG] stable-diffusion.cpp:419  - loading weights
[DEBUG] model.cpp:1727 - loading tensors from ../sd-models/sd-v1-4.ckpt
  |==================================================| 1131/1131 - 1000.00it/s
[INFO ] stable-diffusion.cpp:518  - total params memory size = 2719.24MB (VRAM 2719.24MB, RAM 0.00MB): clip 469.44MB(VRAM), unet 2155.33MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:522  - loading model from '../sd-models/sd-v1-4.ckpt' completed, taking 11.57s
[INFO ] stable-diffusion.cpp:556  - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:600  - finished loaded file
[DEBUG] stable-diffusion.cpp:1548 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1241 - prompt after extract and remove lora: "a cat"
[INFO ] stable-diffusion.cpp:690  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1246 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:357  - parse 'a cat' to [['a cat', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] ggml_extend.hpp:1129 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] conditioner.hpp:485  - computing condition graph completed, taking 639 ms
[DEBUG] conditioner.hpp:357  - parse '' to [['', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] ggml_extend.hpp:1129 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] conditioner.hpp:485  - computing condition graph completed, taking 41 ms
[INFO ] stable-diffusion.cpp:1379 - get_learned_condition completed, taking 682 ms
[INFO ] stable-diffusion.cpp:1402 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1439 - generating image: 1/1 - seed 42
[DEBUG] stable-diffusion.cpp:808  - Sample
[DEBUG] ggml_extend.hpp:1129 - unet compute buffer size: 559.90 MB(VRAM)
  |==================================================| 20/20 - 2.04s/it
[INFO ] stable-diffusion.cpp:1478 - sampling completed, taking 42.03s
[INFO ] stable-diffusion.cpp:1486 - generating 1 latent images completed, taking 42.03s
[INFO ] stable-diffusion.cpp:1489 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1129 - vae compute buffer size: 1664.00 MB(VRAM)
[DEBUG] stable-diffusion.cpp:1090 - computing vae [mode: DECODE] graph completed, taking 5.21s
[INFO ] stable-diffusion.cpp:1499 - latent 1 decoded, taking 5.21s
[INFO ] stable-diffusion.cpp:1503 - decode_first_stage completed, taking 5.21s
[INFO ] stable-diffusion.cpp:1628 - txt2img completed in 47.92s
save result PNG image to 'output.png'

Image

@stduhpf
Copy link
Contributor

stduhpf commented Apr 6, 2025

That's why i opened this PR: #629

Until it gets merged, you can set the GGML_VK_VISIBLE_DEVICES env variable to hide the unwanted devices from sdcpp:

For example, to only use the 6800XT on your system:

export GGML_VK_VISIBLE_DEVICES=0

@thecannabisapp
Copy link
Author

@stduhpf amazing thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants