This repository contains the submission code for Tsinghua Machine Learning course: "Finetuning a Stable Diffusion" for style generation.
In this HW, I finetuned stable diffusion in Vintage Artwork Styles dataset . As the convention with Stable Diffusion research, I used v1.4 version for the base model, which can be downloaded from HuggingFace. Place the weights in .\model folder, the final structure should be model/stable-diffusion-v1-4/remaining-model-files. The whole code for running this notebook has been given altogether with the report. But, to reproduce everything, some datasets and model weights need to be downloaded. One can download the 4 folders from this link
and place them all in the same directory as the codes to ensure smooth execution. The folders include split for training data (sampled_dataset_with_resized_and_random_cropped_images/), validation data (validation_sampled_dataset_with_resized_and_random_cropped_images/), diffusers source code (diffusers/) and the LoRA weight checkpoints for the stable diffusion
(lora_result_vintage/).
Install diffusers from source with pip install git+https://github.com/huggingface/diffusers
Vintage Artwork Style dataset contains 60k captioned text-to-image photos from the 20th century, consisting of vintage pulp, sci-fi and pinup artworks from the 20th century. It is consists mostly of magazine cover, book cover, and old cartoon. The dataset has short and long captions for each image. The large captions were made with florence-2-large-ft, and then shortened with llama3-8b. One important thing to take note is that the image data is given in url link, rather than the PIL Image like some of the huggingface dataset format. Manual download of the data from the online link is needed. Image resolution is not guaranteed to be 512 x 512 (the resolution of SDv1.4 training image)
As the file format and image size of the original dataset and SD1.4's required format are not common, the notebook downloaddataset.ipynb is going to handle dataset formating. It does dataset splitting, image download from online source, and reformatting (random cropping and file format change). For detail explanation, please refer to the notebook's markdown.
The finetuning execution, inference, train/validation evaluation result are all included in the diffusion_ft.ipynb. For detail explanation and result, please refer to the notebook's markdown.