Show your support! You can try HunyuanVideo free with some of our custom spice here. Supporting LeapFusion enables us to do more open source releases like this in the future!
Training code can be found Here.
First, Download the hunyuan weights as explained here and get the image2video lora weights from here. Then run the following command to encode an image: (ex. input_image.png)
python encode_image.py --vae hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_tiling --image ./input_image.png
Then, you can launch generate a video with something like:
python generate.py --fp8 --video_size 320 512 --infer_steps 30 --save_path ./samples/ --output_type both --dit mp_rank_00_model_states.pt --attn_mode sdpa --split_attn --vae hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 --text_encoder1 llava_llama3_fp16.safetensors --text_encoder2 clip_l.safetensors --lora_multiplier 1.0 --lora_weight img2vid.safetensors --video_length 129 --prompt "" --seed 123
Leaving the prompt blank, the model will infer based on the image alone. If you prompt changes, make sure to describe some baseline details about the image too or you might get bad results.
Note: The current model is trained at 512x320, as our research budget is quite small. If anyone would like to help train a higher res chekpoint and has some spare compute, please reach out!
icremyuv.mp4
dog.mp4
bubbles.mp4
meme2.mp4
Much of the code is based on musubi-tuner. Code under the hunyuan_model
directory is modified from HunyuanVideo and follows their license.
Other code is under the Apache License 2.0. Some code is copied and modified from musubi-tuner, k-diffusion and Diffusers.