Using Pivotal Tuning for Fine Tuning? #252

yanciyong · 2024-04-13T15:54:48Z

yanciyong
Apr 13, 2024

I'm glad the pivotal tuning is ready to test in universal embeddings branch. But there's few clear documentation how to use this feature or the caption.

Do I need special condition in the caption?, and is additional embedding are identical to pivotal tuning?.
If so, how to do the embedding process? and how to train thousands of concept then?

For example, one of my caption is "Ryan Reynolds from Selfless as Damian wearing a black suit with red shoe, Damian standing in front of a bus stop reading a newspaper". My intention is I want to train Ryan Reynolds as himself and Damian as a character from Selfless with his signature appearance.

Based on wiki, base embedding is blank in default and I think I should keep them blank because I want to train a new embedding, so it's clear.

My confusion start from here, when I put custom placeholder (trigger word), let's say <Ryan Reynolds>, or <Embedding2121>, or <DamianSL>, should I change the caption into <Embedding2121> from Selfless as Damian wearing a black suit with red shoe, Damian standing in front of a bus stop reading a newspaper, or <Ryan Reynolds> from Selfless as <DamianSL> wearing a black suit with red shoe, <DamianSL> standing in front of a bus stop reading a newspaper?.

And what "initial embedding text" does?, did it will change the <Ryan Reynolds>, or <Embedding2121>, or <DamianSL> into the alternative trigger word that I want?. For example if I use <Embedding2121> from Selfless as Damian wearing a black suit with red shoe, Damian standing in front of a bus stop reading a newspaper and I use "Ryan Reynolds" in the initial embedding text instead of "*", will it change in the "training blackbox" into Ryan Reynolds from Selfless as Damian wearing a black suit with red shoe, Damian standing in front of a bus stop reading a newspaper and the embedding will train that has Ryan Reynolds in the caption as a concept?

And if I have thousands of concept that I should train, this would be hassle because I should insert the additional embedding one by one and no preset load. My captioning structure is [Character name][Appearance][Activity], [The Background], [additional non descriptive caption]. So if I use Keep tag count (3), the trainer will focus on the first three tags and shuffle the rest, this will avoid the trunctation. If pivotal tuning can be trained based on the Caption emergence (which will focus to learn the character because character name is always in the front of the caption/tags) than it would be good

Answered by Calamdor

Apr 13, 2024

I am not sure why you want to use the additional embedding for this, instead of just a standard finetune, but the initial implementation is in no way going to be useful as you will have thousands of additional text embeddings to manage.

Why not start with one concept first and see how it goes? As far as I know, this is uncharted territory, there is not much guidance to give. Nero can give input on how their implementation works, but I do not think anyone has seen how far this can go.

I started the additional embedding page on the wiki, check it out (https://github.com/Nerogar/OneTrainer/wiki/Additional-Embeddings).

View full answer

Calamdor · 2024-04-13T17:16:00Z

Calamdor
Apr 13, 2024

I am not sure why you want to use the additional embedding for this, instead of just a standard finetune, but the initial implementation is in no way going to be useful as you will have thousands of additional text embeddings to manage.

Why not start with one concept first and see how it goes? As far as I know, this is uncharted territory, there is not much guidance to give. Nero can give input on how their implementation works, but I do not think anyone has seen how far this can go.

I started the additional embedding page on the wiki, check it out (https://github.com/Nerogar/OneTrainer/wiki/Additional-Embeddings).

1 reply

yanciyong Apr 14, 2024
Author

My goal is to train my multiconcept model more accurate or atleast less hallucination. So if I want to train let's say Damian (Selfless) token, the output should looks Ryan Reynolds as Damian using his signature appearance in the Selfless if no further caption about his appearance, and Damian token should not mixed with other character which has identical name (which is why I put (selfless) at the end of his name). Also I could ask the model to generate Ryan Reynolds using Jack o latern costume with Ryan Reynolds face and body structure, and not mixed with other character. Standard finetuning in SDXL cannot achieve this level, it needs more advance training method. Using LoRA could solve this problem but, it's hassle to use because it need additional node (lora loader) to make LoRA work and choosing which LoRA model should I use. My hope is make the inference as easy as possible, just put the trigger word on the positive prompt and you get what you want.

It's like when you ask lyrics on offline LLM model like GPT 3.5 or GPT 4, it will generate the accurate song lyrics without lora or any additional inference tool while in LLaMa 70B the song lyrics hallucinating into non existing lyrics. In image generation model, the nearest practice is in NovelAI which can generate an accurate character, but I'm not sure if they involving some workaround in their backend like ZLS or Simple LoRA loader when trigger word is called.

But atleast I can train multiconcept without train the LoRA one by one with universal embeddings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Pivotal Tuning for Fine Tuning? #252

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using Pivotal Tuning for Fine Tuning? #252

yanciyong Apr 13, 2024

Replies: 1 comment · 1 reply

Calamdor Apr 13, 2024

yanciyong Apr 14, 2024 Author

yanciyong
Apr 13, 2024

Replies: 1 comment 1 reply

Calamdor
Apr 13, 2024

yanciyong Apr 14, 2024
Author