I'm trying to use moondream3-preview and due to its use of flex decoding, it only runs on CUDA devices. I'm on a MacBook Pro which only supports MPS.
There seems to be a flag in the code called use_flex_decoding, but somehow I can't set it meaningfully in my code, e.g., like
moondream.use_flex_decoding = False
Still fails with attempts to call the CUDA-only create_block_mask.
If instead I actually patch self.use_flex_decoding = True to False in MoondreamModel in the model code itself, Moondream 3 Preview seems to work fine on my Mac.
I wonder if use_flex_decoding could simply be a parameter on the module constructor so that the bad value doesn't have a chance to take effect anywhere before I can change it.