diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/.DS_Store b/docs/getting_started/focused_getting_starteds/Segmentation/.DS_Store new file mode 100644 index 0000000..97e9115 Binary files /dev/null and b/docs/getting_started/focused_getting_starteds/Segmentation/.DS_Store differ diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/assets/.DS_Store b/docs/getting_started/focused_getting_starteds/Segmentation/assets/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/docs/getting_started/focused_getting_starteds/Segmentation/assets/.DS_Store differ diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/index.md b/docs/getting_started/focused_getting_starteds/Segmentation/index.md new file mode 100644 index 0000000..13e07a5 --- /dev/null +++ b/docs/getting_started/focused_getting_starteds/Segmentation/index.md @@ -0,0 +1,37 @@ +# Getting Started with Segmentation in FiftyOne + +## Who this is for + +This page is for those new to FiftyOne looking to get started with segmentation workflows! +We will cover how to load, visualize, enrich, and evaluate segmentation datasets with FiftyOne. + +This tutorial is ideal for computer vision engineers and AI researchers working with instance and semantic segmentation tasks. +Some basic knowledge of Python and computer vision is assumed. + +## Assumed Knowledge + +We assume familiarity with common segmentation tasks (semantic and instance), dataset formats (e.g., COCO), and how masks or polygons are used in visual tasks. + +## Time to complete + +20–30 minutes + +## Required packages + +To follow along, you’ll need the following packages: + +```bash +pip install fiftyone opencv-python-headless pillow matplotlib +pip install torch torchvision +``` + +## Content + +### [Step 1: Loading Segmentation Datasets](./step1.ipynb) +Learn how to load semantic and instance segmentation datasets from FiftyOne’s zoo and from custom formats like COCO or segmentation masks. + +### [Step 2: Adding Instance Segmentations](./step2.ipynb) +Enrich your dataset by adding segmentation predictions using both built-in models (e.g., SAM2) and your own custom models, with polygon and bounding box support. + +### [Step 3: Segment Anything 2 (SAM2) in FiftyOne](./step3.ipynb) +Explore SAM 2’s groundbreaking capabilities for image and video segmentation. Use bounding boxes, keypoints, or zero prompts, and run video mask propagation from a single frame. diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/step1.ipynb b/docs/getting_started/focused_getting_starteds/Segmentation/step1.ipynb new file mode 100644 index 0000000..a00d512 --- /dev/null +++ b/docs/getting_started/focused_getting_starteds/Segmentation/step1.ipynb @@ -0,0 +1,170 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 1: Loading a Segmentation Dataset in FiftyOne" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this first step, we will explore how to load **segmentation datasets** into FiftyOne. Segmentation datasets may be of two types: **semantic segmentation** (pixel-wise class labels) and **instance segmentation** (individual object masks). \n", + "\n", + "FiftyOne makes it easy to load both types using its Dataset Zoo or from custom formats like COCO or FiftyOne format. Let's start by loading a common format instance segmentation dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loading a Common Format Segmentation Dataset\n", + "Segmentation datasets are often provided in standard formats such as COCO, VOC, YOLO, KITTI, and FiftyOne format. FiftyOne supports direct ingestion of these datasets with just a few lines of code.\n", + "\n", + "Make sure your dataset follows the folder structure and file naming conventions required by the specific format (e.g., COCO JSON annotations or class mask folders for semantic segmentation)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone as fo\n", + "\n", + "# Create the dataset\n", + "name = \"my-dataset\"\n", + "dataset_dir = \"/path/to/segmentation-dataset\"\n", + "\n", + "# Create the dataset\n", + "dataset = fo.Dataset.from_dir(\n", + " dataset_dir=dataset_dir,\n", + " dataset_type=fo.types.COCODetectionDataset, # Change with your type\n", + " name=name,\n", + ")\n", + "\n", + "# View summary info about the dataset\n", + "print(dataset)\n", + "\n", + "# Print the first few samples in the dataset\n", + "print(dataset.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Check out the docs for each format to find optional parameters you can pass for things like train/test split, subfolders, or label paths: https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/dataset_types.html" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# FiftyOne with a Coffee-Beans Dataset\n", + "We will walk through how to use [FiftyOne](https://voxel51.com/docs/fiftyone) to build better segmentation datasets and models. \n", + "\n", + "- Load your own dataset [into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html). For this example, we use a [Coffee-Beans Dataset](https://huggingface.co/datasets/pjramg/colombian_coffee) in COCO format.\n", + "- Use FiftyOne [in a notebook](https://voxel51.com/docs/fiftyone/environments/index.html#notebooks)\n", + "- Explore your segmentation dataset using [views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) and the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html)\n", + "\n", + "*Note: For manually adding segmentations, refer to the `step_x` notebook.*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone as fo\n", + "\n", + "dataset = fo.Dataset.from_dir(\n", + " dataset_type=fo.types.COCODetectionDataset,\n", + " dataset_dir=\"./colombian_coffee\",\n", + " data_path=\"images/default\",\n", + " labels_path=\"annotations/instances_default.json\",\n", + " label_types=\"segmentations\",\n", + " label_field=\"categories\",\n", + " name=\"coffee\",\n", + " include_id=True,\n", + " overwrite=True\n", + ")\n", + "\n", + "# View summary info about the dataset\n", + "print(dataset)\n", + "\n", + "# Print the first few samples in the dataset\n", + "print(dataset.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see our images have loaded in the App, but no segmentation masks are shown yet. Next, we’ll ensure annotations are properly loaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using the App\n", + "\n", + "With the FiftyOne App, you can visualize your samples and their segmentation masks in an interactive UI. Double-click any sample to enter the expanded view, where you can study individual samples with overlayed masks.\n", + "\n", + "The [view bar](https://voxel51.com/docs/fiftyone/user_guide/app.html#using-the-view-bar) lets you filter and search your dataset to analyze specific classes or objects.\n", + "\n", + "You can seamlessly move between Python and the App. For example, create a filtered view using the `Shuffle()` and `Limit()` stages in Python or directly in the App UI." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once your annotations are loaded correctly, you can confirm that your **segmentation masks** (not detections!) are present and visualized correctly. 🎉" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "session.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "OSS310", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/step2.ipynb b/docs/getting_started/focused_getting_starteds/Segmentation/step2.ipynb new file mode 100644 index 0000000..327ed05 --- /dev/null +++ b/docs/getting_started/focused_getting_starteds/Segmentation/step2.ipynb @@ -0,0 +1,292 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Step 2: Adding Instance Segmentation to a FiftyOne Dataset\n", + "\n", + "We will explore how to enrich your dataset by adding **instance segmentation predictions**.\n", + "\n", + "In this notebook, we’ll cover:\n", + "- Using the FiftyOne Model Zoo to apply instance segmentation\n", + "- Integrating predictions from a custom model (e.g., a model deployed via Intel Geti)\n", + "\n", + "---\n", + "\n", + "## Using a Instance Segmentation Dataset \n", + "\n", + "For education purposes, use this link in Drive for downloading an upgraded dataset with 100+ annotated unique images.\n", + "\n", + "Download the dataset with this [Link](https://drive.google.com/file/d/1aCr00sF2hjLw7hpq3yeXNUvC07TXdQsg/view?usp=sharing)\n", + "\n", + "Let’s kick things off by loading the **colombian_coffee-dataset_1600**:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc022a9c", + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone as fo\n", + "from fiftyone.utils.coco import COCODetectionDatasetImporter\n", + "\n", + "dataset = fo.Dataset.from_dir(\n", + " dataset_type=fo.types.COCODetectionDataset,\n", + " dataset_dir=\"./colombian_coffee-dataset_1600\",\n", + " data_path=\"images/default\",\n", + " labels_path=\"annotations/instances_default.json\",\n", + " label_types=\"segmentations\",\n", + " label_field=\"categories\",\n", + " name=\"coffee\",\n", + " include_id=True,\n", + " overwrite=True\n", + ")\n", + "\n", + "view = dataset.shuffle()\n", + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "source": [ + "----\n", + "## Loading predictions using SAM2\n", + "\n", + "With FiftyOne, you have tons of pretrained models at your disposal to use via the [FiftyOne Model Zoo](https://docs.voxel51.com/model_zoo/index.html) or using one of our [integrations](https://docs.voxel51.com/integrations/index.html) such as [HuggingFace](https://docs.voxel51.com/integrations/huggingface.html)! To get started using them, first load the model in and pass it into the apply_model function. \n", + "\n", + "Now apply Segment Anything [SAM2](https://voxel51.com/blog/sam-2-is-now-available-in-fiftyone/) from the FiftyOne Model Zoo.\n", + "\n", + "Install SAM2 following the instuctions from this [Repo](https://github.com/facebookresearch/sam2). You can also jump to the next step of this tutorials to understand how SAM2 works with FiftyOne" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16486845", + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone.zoo as foz\n", + "model = foz.load_zoo_model(\"segment-anything-2-hiera-tiny-image-torch\")\n", + "\n", + "# Prompt with boxes\n", + "dataset.apply_model(\n", + " model,\n", + " label_field=\"sam2_predictions\",\n", + " prompt_field=\"categories_segmentations\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5addd16", + "metadata": {}, + "outputs": [], + "source": [ + "# Confirm predictions were added\n", + "sample = dataset.first()\n", + "print(sample['sam2_predictions'])" + ] + }, + { + "cell_type": "markdown", + "id": "fb51bda9", + "metadata": {}, + "source": [ + "## Loading predictions using a custom model (Intel Geti Example)\n", + "\n", + "Let’s now simulate the pipeline with a custom instance segmentation model. If you want to run the inference using the same example, please refer tho this [example](https://github.com/paularamo/awesome-fiftyone/blob/main/getting-started-coffee/coffee_evaluation_MArskRCNN_ResNet50.ipynb) for your reference. \n", + "\n", + "Assuming you’ve already set up inference with a model (e.g., via OpenVINO + Intel Geti SDK), you can run predictions like this:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Generating instance segmentation masks from polygons and bounding boxes\n", + "\n", + "This function extracts instance segmentation masks from polygon annotations, combining **detection (bounding boxes)** and **segmentation (masks)** in the same instance using `fo.Detection`.\n", + "\n", + "1. **Load Image** – Reads and converts the image to RGB. \n", + "2. **Process Annotations** – Extracts polygon points, computes bounding boxes, and normalizes coordinates. \n", + "3. **Generate Masks** – Creates, crops, and resizes binary masks for each annotation. \n", + "4. **Save & Return** – Stores masks as temp files and returns `fo.Detection` objects, ensuring the bounding box and mask belong to the same instance. \n", + "\n", + "This enables accurate visualization and analysis in FiftyOne, preserving both object localization and shape details.\n", + "\n", + "Useful for visualizing or processing segmentation data in FiftyOne." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import cv2\n", + "import fiftyone as fo\n", + "from PIL import Image as PILImage\n", + "from tempfile import NamedTemporaryFile\n", + "from geti_sdk.data_models.shapes import Polygon\n", + "\n", + "def generate_mask_from_polygon_and_bboxes(sample, prediction):\n", + " image = cv2.imread(sample.filepath)\n", + " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", + " img_height, img_width = image.shape[:2]\n", + " print(f\"Image size: {img_width}x{img_height}\")\n", + " detections = []\n", + " for annotation in prediction.annotations:\n", + " if isinstance(annotation.shape, Polygon):\n", + " polygon_points = [(point.x, point.y) for point in annotation.shape.points]\n", + " polygon_points = np.array(polygon_points, dtype=np.int32)\n", + " label = annotation.labels[0].name\n", + " confidence = annotation.labels[0].probability\n", + " x, y, w, h = cv2.boundingRect(polygon_points)\n", + " scaled_x = x / img_width\n", + " scaled_y = y / img_height\n", + " scaled_w = w / img_width\n", + " scaled_h = h / img_height\n", + " bounding_box = [scaled_x, scaled_y, scaled_w, scaled_h]\n", + " mask = np.zeros((img_height, img_width), dtype=np.uint8)\n", + " cv2.fillPoly(mask, [polygon_points], 255)\n", + " cropped_mask = mask[y:y + h, x:x + w]\n", + " mask_resized = cv2.resize(cropped_mask, (w, h), interpolation=cv2.INTER_NEAREST)\n", + " print(f\"Mask size: {mask_resized.shape} (expected: {h}x{w})\")\n", + " with NamedTemporaryFile(delete=False, suffix='.png') as temp_mask_file:\n", + " mask_path = temp_mask_file.name\n", + " cv2.imwrite(mask_path, mask_resized)\n", + " detection = fo.Detection(\n", + " label=label,\n", + " confidence=confidence,\n", + " bounding_box=bounding_box,\n", + " mask_path=mask_path\n", + " )\n", + " detections.append(detection)\n", + " return detections" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For education purposes check what is happening in the first or last sample. Then you can apply this to the whole dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Test on one image\n", + "sample = dataset.first()\n", + "image_path = sample.filepath\n", + "image_data = PILImage.open(image_path)\n", + "image_data = np.array(image_data)\n", + "prediction = deployment.infer(image_data) \n", + "detections = generate_mask_from_polygon_and_bboxes(sample, prediction)\n", + "sample['predicted_segmentations_test'] = fo.Detections(detections=detections)\n", + "sample.save()\n", + "dataset.reload()\n", + "print(dataset)\n", + "print(sample)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "📝 Tip: Replace ```prediction.objects``` with your real output structure and masks." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run the prediction in the whole dataset\n", + "\n", + "This loop processes each sample in the dataset by loading the image, running inference using Geti SDK, and generating instance segmentation masks. The function extracts detections with both bounding boxes and masks, ensuring they belong to the same instance. These predictions are then stored in the sample under `\"predictions_model\"` using `fo.Detections`. Finally, the dataset is reloaded to reflect the updates." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Iterate over the samples in the dataset\n", + "for sample in dataset:\n", + " # Load the image as a NumPy array using PIL or OpenCV\n", + " image_path = sample.filepath # Path to the image file\n", + " print(image_path)\n", + " image_data = PILImage.open(image_path)\n", + " image_data = np.array(image_data) # Convert the image to NumPy array\n", + "\n", + " # Run inference on the sample (using Geti SDK's inference)\n", + " prediction = deployment.infer(image_data)\n", + "\n", + " # Generate the segmentation mask and detections using the annotations from the prediction\n", + " detections = generate_mask_from_polygon_and_bboxes(sample, prediction)\n", + "\n", + " # Add the detections as predicted segmentations\n", + " sample[\"custom_predictions\"] = fo.Detections(detections=detections) # Change to `\"predictions_model1\"` for evaluation\n", + "\n", + " # Save the updated sample\n", + " sample.save()\n", + "\n", + "# Reload the dataset to reflect the changes\n", + "dataset.reload()" + ] + }, + { + "cell_type": "markdown", + "id": "bf19b2f0", + "metadata": {}, + "source": [ + "## Compare Predictions in FiftyOne App\n", + "Toggle between `ground_truth`, `sam2_predictions`, and `custom_predictions` in the App to explore and compare different segmentations side-by-side!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "session = fo.launch_app(dataset)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "OSS310", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/step3.ipynb b/docs/getting_started/focused_getting_starteds/Segmentation/step3.ipynb new file mode 100644 index 0000000..4ba3acc --- /dev/null +++ b/docs/getting_started/focused_getting_starteds/Segmentation/step3.ipynb @@ -0,0 +1,268 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "730d412d", + "metadata": {}, + "source": [ + "# Step 3: Using SAM 2 in FiftyOne\n", + "\n", + "**Segment Anything 2 (SAM 2)** is a powerful segmentation model released in July 2024 that pushes the boundaries of image and video segmentation. It brings new capabilities to computer vision applications, including the ability to generate precise masks and track objects across frames in videos using just simple prompts.\n", + "\n", + "In this notebook, you'll learn how to:\n", + "- Understand the key innovations in SAM 2\n", + "- Apply SAM 2 to image datasets using bounding boxes, keypoints, or no prompts at all\n", + "- Leverage SAM 2’s video segmentation and mask tracking capabilities with a single-frame prompt\n" + ] + }, + { + "cell_type": "markdown", + "id": "8dbb5b2f", + "metadata": {}, + "source": [ + "## What is SAM 2?\n", + "\n", + "SAM 2 is the next generation of the Segment Anything Model, originally introduced by Meta in 2023. While SAM was designed for zero-shot segmentation on still images, SAM 2 adds robust video segmentation and tracking capabilities. With just a bounding box or a set of keypoints on a single frame, SAM 2 can segment and track objects across entire video sequences." + ] + }, + { + "cell_type": "markdown", + "id": "9ac21217", + "metadata": {}, + "source": [ + "## Using SAM 2 for Images\n", + "\n", + "SAM 2 integrates directly with the FiftyOne Model Zoo, allowing you to apply segmentation to image datasets with minimal code. Whether you're working with ground truth bounding boxes, keypoints, or want to explore automatic mask generation, FiftyOne makes the process seamless." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc9580f6", + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone as fo\n", + "import fiftyone.zoo as foz\n", + "\n", + "# Load dataset\n", + "dataset = foz.load_zoo_dataset(\"quickstart\", max_samples=25, shuffle=True, seed=51)\n", + "\n", + "# Load SAM 2 image model\n", + "model = foz.load_zoo_model(\"segment-anything-2-hiera-tiny-image-torch\")\n", + "\n", + "# Prompt with bounding boxes\n", + "dataset.apply_model(model, label_field=\"segmentations\", prompt_field=\"ground_truth\")\n", + "\n", + "# Launch app to view segmentations\n", + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "markdown", + "id": "67ae8bf1", + "metadata": {}, + "source": [ + "## Using a custom segmentation dataset\n", + "\n", + "We will use a segmenation dataset with coffee beans, this is a FiftyOne Dataset. ```pjramg/my_colombian_coffe_FO```\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "19c7120f", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "import fiftyone as fo # base library and app\n", + "import fiftyone.utils.huggingface as fouh # Hugging Face integration\n", + "dataset_ = fouh.load_from_hub(\"pjramg/my_colombian_coffe_FO\", persistent=True, overwrite=True)\n", + "\n", + "# Define the new dataset name\n", + "dataset_name = \"coffee_FO_SAM2\"\n", + "\n", + "# Check if the dataset exists\n", + "if dataset_name in fo.list_datasets():\n", + " print(f\"Dataset '{dataset_name}' exists. Loading...\")\n", + " dataset = fo.load_dataset(dataset_name)\n", + "else:\n", + " print(f\"Dataset '{dataset_name}' does not exist. Creating a new one...\")\n", + " # Clone the dataset with a new name and make it persistent\n", + " dataset = dataset_.clone(dataset_name, persistent=True)" + ] + }, + { + "cell_type": "markdown", + "id": "6bfd92b7", + "metadata": {}, + "source": [ + "### Prompting with ground truth information in the 100 unique samples in the dataset\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b39652f9", + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone.brain as fob\n", + "\n", + "results = fob.compute_similarity(dataset, brain_key=\"img_sim2\")\n", + "results.find_unique(100)" + ] + }, + { + "cell_type": "markdown", + "id": "c7e91b30", + "metadata": {}, + "source": [ + "### Apply SAM2 just the 100 unique samples\n", + "\n", + "SAM 2 can also segment entire images without needing any bounding boxes or keypoints. This zero-input mode is useful for generating segmentation masks for general visual analysis or bootstrapping annotation workflows." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc1fd9d3", + "metadata": {}, + "outputs": [], + "source": [ + "unique_view = dataset.select(results.unique_ids)\n", + "session.view = unique_view\n", + "\n", + "import fiftyone.zoo as foz\n", + "model = foz.load_zoo_model(\"segment-anything-2-hiera-tiny-image-torch\")\n", + "\n", + "# Full automatic segmentations\n", + "unique_view.apply_model(model, label_field=\"auto\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "2fe07ff2", + "metadata": {}, + "source": [ + "## Bonus with SAM2" + ] + }, + { + "cell_type": "markdown", + "id": "536ab8bf", + "metadata": {}, + "source": [ + "### Prompting with Keypoints\n", + "\n", + "Keypoint prompts are a great alternative to bounding boxes when working with articulated objects like people. Here, we filter images to include only people, generate keypoints using a keypoint model, and then use those keypoints to prompt SAM 2 for segmentation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eae16d65", + "metadata": {}, + "outputs": [], + "source": [ + "from fiftyone import ViewField as F\n", + "\n", + "# Filter persons only\n", + "dataset = foz.load_zoo_dataset(\"quickstart\")\n", + "dataset = dataset.filter_labels(\"ground_truth\", F(\"label\") == \"person\")\n", + "\n", + "# Apply keypoint detection\n", + "kp_model = foz.load_zoo_model(\"keypoint-rcnn-resnet50-fpn-coco-torch\")\n", + "dataset.default_skeleton = kp_model.skeleton\n", + "dataset.apply_model(kp_model, label_field=\"gt_keypoints\")\n", + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6aa233a", + "metadata": {}, + "outputs": [], + "source": [ + "# Apply SAM 2 with keypoints\n", + "model = foz.load_zoo_model(\"segment-anything-2-hiera-tiny-image-torch\")\n", + "dataset.apply_model(model, label_field=\"segmentations\", prompt_field=\"gt_keypoints\")\n", + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "markdown", + "id": "cf6a3045", + "metadata": {}, + "source": [ + "## Using SAM 2 for Video\n", + "\n", + "SAM 2 brings game-changing capabilities to video understanding. It can track segmentations across frames from a single bounding box or keypoint prompt provided on the first frame. With this, you can propagate high-quality segmentation masks through entire sequences automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f51f6af", + "metadata": {}, + "outputs": [], + "source": [ + "dataset = foz.load_zoo_dataset(\"quickstart-video\", max_samples=2)\n", + "from fiftyone import ViewField as F\n", + "\n", + "# Remove boxes after first frame\n", + "(\n", + " dataset\n", + " .match_frames(F(\"frame_number\") > 1)\n", + " .set_field(\"frames.detections\", None)\n", + " .save()\n", + ")\n", + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a6ca5e33", + "metadata": {}, + "outputs": [], + "source": [ + "# Apply video model with first-frame prompt\n", + "model = foz.load_zoo_model(\"segment-anything-2-hiera-tiny-video-torch\")\n", + "dataset.apply_model(model, label_field=\"segmentations\", prompt_field=\"frames.detections\")\n", + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "markdown", + "id": "652223b9", + "metadata": {}, + "source": [ + "## Available SAM 2 Models in FiftyOne\n", + "\n", + "**Image Models:**\n", + "- `segment-anything-2-hiera-tiny-image-torch`\n", + "- `segment-anything-2-hiera-small-image-torch`\n", + "- `segment-anything-2-hiera-base-plus-image-torch`\n", + "- `segment-anything-2-hiera-large-image-torch`\n", + "\n", + "**Video Models:**\n", + "- `segment-anything-2-hiera-tiny-video-torch`\n", + "- `segment-anything-2-hiera-small-video-torch`\n", + "- `segment-anything-2-hiera-base-plus-video-torch`\n", + "- `segment-anything-2-hiera-large-video-torch`" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/getting_started/focused_getting_starteds/Segmentation/summary.md b/docs/getting_started/focused_getting_starteds/Segmentation/summary.md new file mode 100644 index 0000000..6f83a61 --- /dev/null +++ b/docs/getting_started/focused_getting_starteds/Segmentation/summary.md @@ -0,0 +1,44 @@ +# Segmentation Getting Started Series Summary + +This short series walks you through the core components of working with segmentation data in FiftyOne — from loading and visualizing, to predicting and tracking masks. + +## Summary of Steps + +### Step 1: Loading Segmentation Datasets + +Explore how to load segmentation datasets into FiftyOne using both built-in datasets from the zoo and custom datasets in COCO or mask folder format. + +### Step 2: Adding Instance Segmentations + +Learn how to add instance segmentation predictions to your datasets using both pre-trained models like SAM2 and your own models, including how to convert polygons and bounding boxes into masks. + +### Step 3: Segment Anything 2 (SAM2) in FiftyOne + +Dive into the advanced image and video segmentation capabilities of SAM 2, including prompting via bounding boxes or keypoints, and automatic segmentation for visual AI workflows. + +--- + +This series is part of the **Getting Started with FiftyOne** initiative. For more tutorials, head to [FiftyOne Documentation](https://beta-docs.voxel51.com/). + + +## Next Steps + +Now that you've completed the Segmentation Getting Started series, here are some suggested next steps to deepen your journey with FiftyOne: + +- **Explore Model Evaluation** + Learn how to evaluate segmentation models by comparing predictions against ground truth, and identifying failure cases in your dataset. + +- **Try Out FiftyOne Plugins** + Extend your workflow with powerful plugins like video embeddings, active learning tools, and integrations with annotation platforms. + +- **Connect with the Community** + Share your findings, ask questions, or browse community projects on the [FiftyOne Discord](https://community.voxel51.com) or [GitHub Discussions](https://tbd.com). + +- **Load Your Own Dataset** + Adapt these workflows to your real-world segmentation projects. Whether it's agriculture, retail, or industrial inspection — FiftyOne supports it all. + +- **Read the Docs** + Dive deeper into what FiftyOne can do in the [official documentation](https://beta-docs.voxel51.com/). + +We can't wait to see what you'll build next with FiftyOne! +