Replies: 3 comments 5 replies
-
Hi @cxrodgers, Glad you found part of your answer! For the grid search: Data AugmentationData augmentation will directly help account for variations. I would keep the rotation angle constant (+-180 deg if bird's eye view, otherwise +-15 deg), estimate and keep scale min/max constant (how zoomed in/out do you expect keypoints to be in relation to your ground truth labels), and keep random flip at None (we haven't seen much improvement using this parameter). The other augmentation parameters (uniform noise, gaussian noise, contrast, and brightness) you can include in your search to see what works best for your data. OptimizationThe title of this section makes it seem like we are optimizing for how fast the network trains, but the parameters here don't just affect training speed. I would keep the number of epochs, Stop Training on Plateau constant, and the number of plateau patience epochs constant (we never reach this number if we stop on plateau). In the grid search I would target initial learning rate and batch size (although a larger batch size may be memory constrained). Online mining could help, but a toss up of whether it's critical to training. ModelFor the unet backbone, focus the search on filters and filters rate (possibly also max stride), but keep the rest constant. The filters and filter rate will determine how much learning capacity your network has (if you have too much capacity and not enough complexity in your data, then this could lead to overfitting). I would recommend sticking to the unet architecture. As for the values to use, I would vary slightly from the preset values that are present. (The presets are the values we found to be most useful in our own grid search.) Thanks, |
Beta Was this translation helpful? Give feedback.
-
You could also export a SLEAP package to have your videos embedded in the file, instead of referenced elsewhere. |
Beta Was this translation helpful? Give feedback.
-
Chiming in with another tip: You can also use Generating splits with embedded images: import sleap_io as sio
# Load source labels.
labels = sio.load_file("labels.v001.slp")
# Make splits and export with embedded images.
labels.make_training_splits(n_train=0.8, n_val=0.1, n_test=0.1, save_dir="split1", seed=42)
# Splits will be saved as self-contained SLP package files with images and labels.
labels_train = sio.load_file("split1/train.pkg.slp")
labels_val = sio.load_file("split1/val.pkg.slp")
labels_test = sio.load_file("split1/test.pkg.slp") The docs also have examples of how to build the labels via the API as well. |
Beta Was this translation helpful? Give feedback.
-
Hello .. I have a large dataset of thousands of similar videos, and I'd like to determine the best way to analyzing them all, ideally without labeling frames from every single one. I'm facing an immediate technical issue, which is that the SLEAP GUI is a little slow to respond when I load that many videos in. I think it's because they're all located on a remote server, and fetching all that metadata + decoding frames is intrinsically slow. I also want to use my custom code to select frames to label, rather than the built-in options. So I'm wondering if it's possible to create an SLP file for labeling and/or training that doesn't include links to the videos, but rather the actual labeled frames themselves. Would this be a "labels file", a "training package", or something else? I understand some custom coding/hacking will be necessary to do that, it would just be really helpful if you could point me in the right direction to get started. I'd like to be able to create this file in a programmatic way, and then solely use the GUI for the user labeling (because the GUI works great for this and I don't want to reinvent that!)
This would also address a long-term concern, which is I'd like to train a ton of different SLEAP models with different hyperparameters on this training package, in order to select the best hyperparameters. Like a huge grid search. The goal is to find hyperparameters that best generalize to new videos (so that I can eventually stop labeling new videos, and just use my trained model, even when the video conditions change slightly). Is there any guidance on the most useful parameters to vary in such a grid search?
thanks for any tips!
EDIT
I think I figured it out! I followed these instructions to create my own labels file: #1534
And then I put my frames into individual image files, and referenced them with a
sleap.Video.from_image_filenames
I was worried it wouldn't work since the aspect ratios differ, but this seems to be fine.
I would still appreciate any guidance about what parameters are most useful to vary in a grid search.
Beta Was this translation helpful? Give feedback.
All reactions