Hey there, so I put together this Kaggle notebook to handle annotating images for a YOLO segmentation model, focusing on clothing classes like full_sleeve, half_sleeve, and pants_full. The goal is to pull from a big 100k image dataset, filter for single-person shots, annotate at least 30 (but I scaled to 100 here), validate, visualize like the demo examples, and zip it all up under 1GB. I'll walk you through the code step by step, like I'm explaining it over coffee - no fancy jargon overload, just what each part does and why. It's all Python, running in Kaggle's environment with stuff like Ultralytics YOLO and Segment Anything Model (SAM).
Okay, so first off, the code imports some standard libraries: numpy for arrays, pandas (though not used much here), and os for file handling. It lists files in /kaggle/input/ to check your dataset (like my 'human100k' folder with photos).
Then, it creates the main folder structure:
- Root: '/kaggle/working/clothing-annotations' - this is where everything lives, persistent in Kaggle.
- Subset folder inside it for images and labels - keeps things organized for the small batch we're working with (100 images to start, but scalable).
- It makes images and labels subdirs under subset.
Why? YOLO needs this structure for training later - images paired with .txt labels by name.
Next, it writes a data.yaml file:
- Defines the path, train/val (using same for now), number of classes (3), and names.
- This is crucial for YOLO to know what classes we're dealing with: 0=full_sleeve, 1=half_sleeve, 2=pants_full.
Finally, copies one sample image from input to subset/images as a test. Print statements confirm it's set up.
Now we get into picking images. Imports more stuff: random for sampling, shutil for copying, Ultralytics YOLO for detection, cv2 for image reading, Path for paths.
- Loads YOLOv8m.pt (medium model for better accuracy than nano, but not too slow on GPU).
- Lists all images from input dir.
- Defines a function has_single_person: Reads image, runs YOLO inference, checks if exactly one 'person' (class 0) with conf >0.5.
Then the filtering:
- Samples 20k random images from 100k (to speed things up - full scan would take forever).
- Loops through them, adds to qualifying_images if single person.
- Caps at 200 qualifiers to have a good pool.
- If less than 100 qualifiers, warns and uses all; else, random picks 100.
- Copies those 100 to subset/images.
Prints how many selected and where. This ensures images match the demo - one main person, no crowds.
Installs Segment Anything from GitHub via pip. Downloads the big vit_h model weights.
Loads SAM: Creates the model registry, predictor.
This sets up for semi-auto segmentation - SAM generates masks from box prompts.
Imports for plotting (matplotlib), more np.
Creates labels dir if needed.
Defines mask_to_yolo: Converts binary mask to YOLO .txt format (contours to normalized polygons with class ID).
Defines auto_assign_sleeve: Checks upper mask height - if >25% image height, full_sleeve (0); else half (1). Heuristic based on sleeve length.
Loops over the 100 selected_images:
- Loads image, converts to RGB.
- Embeds in SAM predictor.
- Defines boxes: Upper 40% for sleeves, lower 60% for pants.
- Predicts masks.
- Auto-assigns upper class.
- Fixed lower as 2.
- Converts masks to YOLO lines.
- Writes to .txt in labels dir.
Prints for each. This creates the actual annotations without manual drawing every time.
More imports, creates validation_visuals dir.
Class colors: red for 0, green for 1, blue for 2.
yolo_to_polygons: Reverses .txt to contours (denormalizes).
check_area_sanity: Checks if upper area 20-40%, lower 30-50% of image - warns if not.
Loop over images:
- Loads, gets polygons.
- Overlays lines on copy, blends 50% transparency.
- Saves blended visual.
- Runs sanity check, prints issues or pass.
- (plt.show omitted for batch).
Prints where visuals saved. This quality-checks annotations match expected proportions.
Just re-defines auto_assign_sleeve - probably leftover, but it's the same as before. If running, it overrides, but no harm.
Creates labeled-images dir.
Recalibrated colors: Cyan-ish for full (255,200,0), magenta for half (180,0,255), green-blue for pants (100,200,50) - to match demo vibes.
yolo_to_polygons same.
Loop:
- Loads image, gets polygons.
- Copies image, draws thicker lines (4) with new colors.
- Saves as JPEG at 85% quality for size control.
Counts and prints. This makes visuals like the demo outlines - bold, colored per class.
Imports zipfile.
Sets root and output zip name (annotated_dataset-3.zip).
Pre-zip: Walks dirs, sums sizes in MB, prints.
Zips with DEFLATED: Includes everything except validation_visuals.
Calculates zipped GB, prints, warns if >=1GB.
This packages the subset (images, labels, yaml, labeled-images) for download/submission, keeping under 1GB.