Was the pretrained model trained using utterance-level embedding or speaker-level embedding?

I notice in https://github.com/PlayVoice/whisper-vits-svc/blob/bigvgan-mix-v2/prepare/preprocess_train.py#L9 , the default value of IndexBySinger is False, which means to use utterance-level embedding for training. Was the pretrained model trained using the same configuration?