Skip to content

feat(z-image): add sigma schedules, aspect ratios, saveinfo, shift, MCF sampler;#353

Open
terribilissimo wants to merge 1 commit intofilipstrand:mainfrom
terribilissimo:feature/some-enhancements-to-mflux
Open

feat(z-image): add sigma schedules, aspect ratios, saveinfo, shift, MCF sampler;#353
terribilissimo wants to merge 1 commit intofilipstrand:mainfrom
terribilissimo:feature/some-enhancements-to-mflux

Conversation

@terribilissimo
Copy link
Contributor

@terribilissimo terribilissimo commented Feb 15, 2026

Note for Filip: Feel free to pick what you like from this PR and discard the rest. It's perfectly good to discard this PR and make another with just the things you want to integrate (if any).

Update: This PR now also includes aspect-ratio-preserving dimension scaling for img2img workflows.

When a reference image is provided via --image-path, you can specify output dimensions (or just one of them) as scale factors:

# Scale both dimensions by 1.2× (aspect ratio preserved)
mflux-generate-z-image-turbo \
  --image-path photo.jpg --image-strength 0.3 \
  --height 1.2x --steps 9 --seed 42
 
# Absolute width, height auto-computed from aspect ratio
mflux-generate-z-image-turbo \
  --image-path photo.jpg --image-strength 0.3 \
  --width 800 --steps 9 --seed 42

Changes (in dimension_resolver.py):

When only one dimension has a non-unity ScaleFactor (e.g. --height 1.2x) and the other is at auto/default, the scale is propagated to both dimensions — preserving the reference image's aspect ratio.
When one dimension is absolute pixels and the other is unspecified, the missing dimension is computed from the reference image's aspect ratio.
Also enabled supports_dimension_scale_factor=True for mflux-generate-z-image (was already enabled for turbo).
All values snap to multiples of 16. Behavior is unchanged when no reference image is provided.

  • --shift: override automatic sigma shift (mu) for both schedulers

  • --mcf-max-change: MCF sampler that clamps per-step latent changes

  • --cosine / --karras / --exponential: alternative sigma schedules

  • --aspect: aspect ratio presets with auto-dimension computation

  • --saveinfo: descriptive filenames encoding generation parameters

  • All features apply to mflux-generate-z-image-turbo and mflux-generate-z-image

  • NOTE: when no fork-specific flags are used, behavior is identical to upstream

Summary (about the rest)

This PR adds several quality-of-life enhancements to the Z-Image and Z-Image Turbo
pipelines, ported from our zima.py CUDA/PyTorch scripts. All features are
opt-in — when none of the new flags are used, behavior is identical to upstream.

New CLI Flags

Noise Schedule Control

  • --shift <float> — Override the automatic sigma shift (mu) value.
    By default mu is computed from image dimensions. Higher values push the
    noise schedule towards higher noise levels.
  • --cosine — Smooth S-curve sigma schedule (more steps at high/low noise)
  • --karras — Karras schedule with rho=7 (concentrates steps on fine details)
  • --exponential — Log-spaced sigmas between sigma_max and sigma_min
    (Only one of the three can be used at a time; mutually exclusive group)

Sampling

  • --mcf-max-change <float> — MCF (Mean Change Factor) sampler that clamps
    per-step latent changes to prevent sudden jumps. Typical: 0.05–0.50, min/max 0.01/1.00.

Convenience

  • --aspect <preset> — Aspect ratio presets (1:1, 4:3, 3:4, 3:2, 2:3, 16:9,
    9:16, 18:9, 9:18, 21:9, 9:21). If combined with only --width or --height,
    the missing dimension is auto-computed and rounded to multiples of 16.
  • --saveinfo — Save images with descriptive filenames encoding
    timestamp, seed, steps, LoRA, scheduler, and sigma schedule info
    (convenient, you won't have to look at the EXIF or json for quick reproducibility).

Modified Files

  • src/mflux/cli/parser/parsers.py — New args, aspect ratio dict, sigma_schedule resolution; removed --metadata
  • src/mflux/models/common/config/config.pysigma_schedule property
  • src/mflux/models/common/schedulers/linear_scheduler.py_generate_base_sigmas()
  • src/mflux/models/common/schedulers/flow_match_euler_discrete_scheduler.py — same
  • src/mflux/models/z_image/variants/z_image.py — New params in generate_image()
  • src/mflux/models/z_image/cli/z_image_turbo_generate.py — Passthrough + saveinfo
  • src/mflux/models/z_image/cli/z_image_generate.py — Same
  • All CLI entry points (flux, flux2, qwen, z_image, fibo) — Removed export_json_metadata=args.metadata

Testing

All features tested on Apple Silicon (M-series) with both mflux-generate-z-image-turbo
and mflux-generate-z-image. Baseline output is pixel-identical to upstream when
no new flags are used. See TESTING-SCRIPT.txt for the full test matrix (18 tests).

Checklist

  • No new dependencies added
  • All sigma schedule math uses pure MLX (mx.array, mx.cos, mx.linspace)
  • Backward-compatible — default behavior unchanged
  • Passes existing tests
  • Both Z-Image and Z-Image Turbo entry points updated

@filipstrand
Copy link
Owner

@terribilissimo Thanks for the PR! I'll have a look at this tomorrow

@terribilissimo
Copy link
Contributor Author

terribilissimo commented Feb 15, 2026 via email

@azrahello
Copy link
Contributor

A while back there was talk about integrating alternative schedulers/samplers via an external package — that's more or less what I tried doing with mflux-schedulers, a fork of @anthonywu's work. My goal was mainly to explore different aesthetics rather than performance gains, bringing over some samplers/schedulers from ComfyUI to try them on mflux.
What I found is that in mflux the sampler and scheduler are essentially the same thing — there's no separation like in ComfyUI. I ended up implementing cosine, karras, exponential, sqrt, beta, DDIM, STORK, and ER-SDE as external schedulers. I ran quite a few comparison tests, and while each schedule does produce a different look, honestly none of the results made me think "this is clearly better" — there was no obvious winner that stood out immediately.
I also noticed that the results are very prompt-dependent. A schedule that looks great on one prompt might be worse on another, which means you'd essentially need to test every possible combination for each prompt to find the best one... that felt like a huge amount of work for marginal gains.
On top of that, many of these schedulers rely on arbitrary values (rho, shift, eta, etc.) that can significantly affect the output, and without proper documentation explaining what each parameter does and what ranges make sense, it felt like navigating blind. Tweaking values without understanding their impact just adds layers of complexity that make the whole thing harder to manage.
So I stepped back from it, figuring linear is the best default and anything more just adds complications that only engineers care about 😄 — simplicity wins.
Disclaimer: I never really shared this work because I wasn't confident enough in it. I relied heavily on AI tools to develop most of it, so the code might well be buggy, poorly implemented, or just plain wrong in places. Sharing it here mostly as a data point for the discussion, not as a polished solution

scheduler_test_beta
scheduler_test_beta

scheduler_test_cosine
scheduler_test_cosine

scheduler_test_ddim
scheduler_test_ddim

scheduler_test_er_sde_beta
scheduler_test_er_sde_beta

scheduler_test_exponential
scheduler_test_exponential

scheduler_test_karras
scheduler_test_karras

scheduler_test_linear_default
scheduler_test_linear_default

scheduler_test_scaled_linear
scheduler_test_scaled_linear

scheduler_test_sqrt
scheduler_test_sqrt

scheduler_test_stork2
scheduler_test_stork2

@terribilissimo
Copy link
Contributor Author

I also noticed that the results are very prompt-dependent. A schedule that looks great on one prompt might be worse on another, which means you'd essentially need to test every possible combination for each prompt to find the best one...

Yes, I can confirm that.

@filipstrand
Copy link
Owner

filipstrand commented Feb 16, 2026

So I stepped back from it, figuring linear is the best default and anything more just adds complications that only engineers care about 😄 — simplicity wins.

You are making a solid point. This is my overall feeling as well (I personally never switch the scheduler), but I also see how bringing in more optionality is nice for more advanced users.

At the same time, it is also now much more trivial to just have an agent build this in 5 minutes on a local install of the project and have whatever tweaks one wants (but, of course, tedious to maintain over time).

Another thing which I found tricky when doing (semi)automatic model porting via coding agents is that they often want to go in and tweak the "magic numbers" in the schedulers in order to match more closely to the reference implementations. With more options, I could see how agents could get confused more easily. It is easy to get lost in these smaller details and how numbers should be set across models etc (since doing more "agentic ports", I have started to loose track of what we actually have now and if these magic numbers are general or model-specific - e.g I couldn't really defend our current implementation here, except that it seemed to "look good" when I check the last time)

    @staticmethod
    def _compute_empirical_mu(image_seq_len: int, num_steps: int) -> float:
        a1, b1 = 8.73809524e-05, 1.89833333
        a2, b2 = 0.00016927, 0.45666666
        if image_seq_len > 4300:
            return float(a2 * image_seq_len + b2)
        m_200 = a2 * image_seq_len + b2
        m_10 = a1 * image_seq_len + b1
        a = (m_200 - m_10) / 190.0
        b = m_200 - 200.0 * a
        return float(a * num_steps + b)

I'm favouring simplicity over complexity - both in terms of the public CLI/API and for internal stuff. This project is still maintained as a side/hobby project with limited resources. Probably a separate package like what @azrahello and @anthonywu did is still the best approach.

Regarding the other cli flags, I think the aspect ratio flag seems very interesting and that it would auto-calculate dimensions based on partial information like one would intuitively expect. On this topic more generally, I think the current CLI is due for a refresh pretty soon and have some ideas for how it can look but have not gotten around to a writeup.

What would be super helpful to me is if we could start some kind of new discussion (e.g in a new issue - edit: I added this now #357) of how a modern CLI should work and look like. Then we could discuss ideas there, pros and cons of each design choice etc - because I think once we nail that, then implementation is trivial - just give it to Claude code, Cursor or Codex and it will build that for us). I'm thinking of things like naming and behaviour and what really feels intuitive form a user POV:

I'll outline a few things on top of my mind, very unstructured, sorry :) :

  • instead of mflux-generate-flux2 we could instead imagine mflux generate --model bfl/flux2klein .. etc
  • image_paths maybe should just be "--image" or "--images" ("paths" implicit) since now edit models can generally accept multiple inputs
  • still separate mflux generate and mflux edit?? Upcoming "omni models" seem to blur the distinction between edit and generate, but there might still be value in splitting these? (or not?) (we have different techniques like img2img and 'real editing' - both accept an image as input but doesn't really perform the same operation under the hood, and both are very valuable, but maybe the former is more on the "generate" side).
    --height --width supporting the aspect ratios as you suggested @terribilissimo, but also make this work given that we can also say --height 1x if we have an --image-path specified etc.

There are many combinations here, and I feel like this job can be better done if we take a holistic approach to the CLI and redesign it from scratch now that we know so much more about what the project is compared to how the CLI just evolved naturally when we only had one model.

Earlier, this kind of "design work" would have felt very daunting to even start tackling, (mostly because it was so hard to even know what we supported in the first place), but now with agents, I think this is a lot easer to get started with, but in the end we humans have to agree on what makes sense to have, and therein lies the real work.

@filipstrand
Copy link
Owner

I added this issue now #357

@terribilissimo
Copy link
Contributor Author

terribilissimo commented Feb 16, 2026

  • still separate mflux generate and mflux edit?? Upcoming "omni models" seem to blur the distinction between edit and generate, but there might still be value in splitting these? (or not?) (we have different techniques like img2img and 'real editing' - both accept an image as input but doesn't really perform the same operation under the hood, and both are very valuable, but maybe the former is more on the "generate" side).

That's the point of my other PR (I swear, concocted before reading your comment! :] ). Such capability is already there, it seemed wasteful not to leverage it. And it's fun to play with!

--height --width supporting the aspect ratios as you suggested @terribilissimo, but also make this work given that we can also say --height 1x if we have an --image-path specified etc.

That's a very good idea indeed.

You are making a solid point. This is my overall feeling as well (I personally never switch the scheduler), but I also see how bringing in more optionality is nice for more advanced users.

I understand, and maybe those additional schedulers are an unecessary redundance (I seldom try different ones too, tbh). The ability to meddle with the shift, however, is quite useful (at least to me), e.g. many times I succeeded in making a scene less coarse or less "plasticky" by just varying the shift. But it's up to you to decide.

In any case, thanks for reviewing my PR! :)

…CF; remove --metadata

- --shift: override automatic sigma shift (mu) for both schedulers
- --mcf-max-change: MCF sampler that clamps per-step latent changes
- --cosine / --karras / --exponential: alternative sigma schedules
- --aspect: aspect ratio presets with auto-dimension computation
- --saveinfo: descriptive filenames encoding generation parameters
- Remove --metadata JSON sidecar flag (EXIF metadata always embedded)
- Remove export_json_metadata from all CLI entry points
- All features apply to mflux-generate-z-image-turbo and mflux-generate-z-image
- When no fork-specific flags are used, behavior is identical to upstream
@terribilissimo terribilissimo force-pushed the feature/some-enhancements-to-mflux branch from e630a6d to 17b08f8 Compare February 17, 2026 02:26
@filipstrand
Copy link
Owner

That's the point of my other PR (I swear, concocted before reading your comment! :] ). Such capability is already there, it seemed wasteful not to leverage it. And it's fun to play with!

Ah, totally forgot that I had overlooked this for Flux2 🤦‍♂️, of course we should have them there too :) I'll merge your other PR right away! Thanks very much for noticing and fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants