You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update svd docs
* fix example doc string
* update return type hints/docs
* update type hints
* Fix typos in pipeline_stable_video_diffusion.py
* make style && make fix-copies
* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py
Co-authored-by: Steven Liu <[email protected]>
* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py
Co-authored-by: Steven Liu <[email protected]>
* update based on suggestion
---------
Co-authored-by: M. Tolga Cangöz <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
image (`PIL.Image.Image` or `List[PIL.Image.Image]` or `torch.FloatTensor`):
336
-
Image or images to guide image generation. If you provide a tensor, the expected value range is between `[0,1]`.
360
+
Image(s) to guide image generation. If you provide a tensor, the expected value range is between `[0,1]`.
337
361
height (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`):
338
362
The height in pixels of the generated image.
339
363
width (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`):
340
364
The width in pixels of the generated image.
341
365
num_frames (`int`, *optional*):
342
-
The number of video frames to generate. Defaults to 14 for `stable-video-diffusion-img2vid` and to 25 for `stable-video-diffusion-img2vid-xt`
366
+
The number of video frames to generate. Defaults to `self.unet.config.num_frames`
367
+
(14 for `stable-video-diffusion-img2vid` and to 25 for `stable-video-diffusion-img2vid-xt`).
343
368
num_inference_steps (`int`, *optional*, defaults to 25):
344
-
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
369
+
The number of denoising steps. More denoising steps usually lead to a higher quality video at the
345
370
expense of slower inference. This parameter is modulated by `strength`.
346
371
min_guidance_scale (`float`, *optional*, defaults to 1.0):
347
372
The minimum guidance scale. Used for the classifier free guidance with first frame.
@@ -351,29 +376,29 @@ def __call__(
351
376
Frames per second. The rate at which the generated images shall be exported to a video after generation.
352
377
Note that Stable Diffusion Video's UNet was micro-conditioned on fps-1 during training.
353
378
motion_bucket_id (`int`, *optional*, defaults to 127):
354
-
The motion bucket ID. Used as conditioning for the generation. The higher the number the more motion will be in the video.
379
+
Used for conditioning the amount of motion for the generation. The higher the number the more motion
380
+
will be in the video.
355
381
noise_aug_strength (`float`, *optional*, defaults to 0.02):
356
382
The amount of noise added to the init image, the higher it is the less the video will look like the init image. Increase it for more motion.
357
383
decode_chunk_size (`int`, *optional*):
358
-
The number of frames to decode at a time. The higher the chunk size, the higher the temporal consistency
359
-
between frames, but also the higher the memory consumption. By default, the decoder will decode all frames at once
360
-
for maximal quality. Reduce `decode_chunk_size` to reduce memory usage.
384
+
The number of frames to decode at a time. Higher chunk size leads to better temporal consistency at the expense of more memory usage. By default, the decoder decodes all frames at once for maximal
385
+
quality. For lower memory usage, reduce `decode_chunk_size`.
361
386
num_videos_per_prompt (`int`, *optional*, defaults to 1):
362
-
The number of images to generate per prompt.
387
+
The number of videos to generate per prompt.
363
388
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
364
389
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
365
390
generation deterministic.
366
391
latents (`torch.FloatTensor`, *optional*):
367
-
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
392
+
Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for video
368
393
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
369
394
tensor is generated by sampling using the supplied random `generator`.
370
395
output_type (`str`, *optional*, defaults to `"pil"`):
371
-
The output format of the generated image. Choose between `PIL.Image` or `np.array`.
396
+
The output format of the generated image. Choose between `pil`, `np` or `pt`.
372
397
callback_on_step_end (`Callable`, *optional*):
373
-
A function that calls at the end of each denoising steps during the inference. The function is called
374
-
with the following arguments: `callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int,
375
-
callback_kwargs: Dict)`. `callback_kwargs` will include a list of all tensors as specified by
376
-
`callback_on_step_end_tensor_inputs`.
398
+
A function that is called at the end of each denoising step during inference. The function is called
0 commit comments