Currently, it prints out something like this when generating an image:
Building the Clip transformer.
Building the autoencoder.
Building the unet.
Timestep 0/5
Timestep 1/5
Timestep 2/5
Timestep 3/5
Timestep 4/5
Generating the final image for sample 1/1.
It would be nice if it would print out this instead:
Building the Clip transformer.
Building the autoencoder.
Building the unet.
Timestep 0/5 | 10 seconds
Timestep 1/5 | 9 seconds
Timestep 2/5 | 10 seconds
Timestep 3/5 | 11 seconds
Timestep 4/5 | 10 seconds
Generating the final image for sample 1/1
Total Elapsed Time: 52 seconds
That would make it easier to compare different settings, like different schedulers, different SD version, CPU vs GPU, etc, and see how much speed difference there is.