Merge branch 'main' into docs/fix-typo-unet-metadata

minsuking · web-flow · commit 833d87e43db1 · 2025-09-29T15:49:35.000+09:00
diff --git a/auto3dseg/README.md b/auto3dseg/README.md
@@ -56,13 +56,44 @@ We provide [a two-minute example](notebooks/auto3dseg_hello_world.ipynb) for use
 
 To further demonstrate the capabilities of **Auto3DSeg**, [here](./tasks/instance22/README.md) is the detailed performance of the algorithm in **Auto3DSeg**, which won 2nd place in the MICCAI 2022 challenge **[INSTANCE22: The 2022 Intracranial Hemorrhage Segmentation Challenge on Non-Contrast Head CT (NCCT)](https://instance.grand-challenge.org/)**
 
+## Running With Your Own Data
+
+To run Auto3DSeg on your own dataset, you need to build a `datalist.json` file, and pass it to the AutoRunner.
+
+The datalist format is based on the datasets released by the [Medical Segmentation Decathlon](http://medicaldecathlon.com).
+See the function `load_decathlon_datalist` in `monai/data/decathlon_datalist.py` for a description of the format.
+
+For the AutoRunner, we only need the `training` list in the JSON, it does not use any other fields.
+The `fold` key for each image is not required, as the AutoRunner will automatically create cross-validation folds (the number of folds is hard-coded to 5).
+If you do add the cross-validation folds beforehand, the AutoRunner will use these by default.
+You can also choose to include a `validation` list in the JSON file, in which case the AutoRunner will disable cross-validation and use the specified validation set.
+Any other metadata, such as `modality`, `numTraining`, `name`, etc. will not be used by the AutoRunner, but we do recommend using metadata fields to keep track of names and versions of your dataset. If you are using multi-modal scans, it is possible to enter lists of image paths for both the `image` and `label` keys; MONAI will stack them into channels.
+In short, your `datalist.json` file should look like this:
+
+```
+{
+    "name": "Example datalist.json"
+    "training":
+        [
+            {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"},
+            {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"},
+            ...
+        ]
+}
+
+```
+
+The AutoRunner will create a `work_dir` folder in the directory from which it is run, which will contain the resulting models and the copied datalist file _with_ cross-validation folds. This allows you to keep track of which datalist file the models are trained on.
+
+See the description below or the file [run_with_minimal_input.md](docs/run_with_minimal_input.md) to use your datalist with the AutoRunner.
+
 ## Reference Python APIs for Auto3DSeg
 
 **Auto3DSeg** offers users different levels of APIs to run pipelines that suit their needs.
 
 ### 1. Run with Minimal Input using ```AutoRunner```
 
-The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). A sample datalist for an existing MSD formatted dataset can be created using [this notebook](notebooks/msd_datalist_generator.ipynb). After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**.
+The user needs to provide a data list (".json" file) for the new task and data root. A typical data list is as this [example](tasks/msd/Task05_Prostate/msd_task05_prostate_folds.json). [This notebook](notebooks/msd_crossval_datalist_generator.ipynb) features an example to create a datalist with cross-validation folds from an existing MSD dataset. After creating the data list, the user can create a simple "task.yaml" file (shown below) as the minimum input for **Auto3DSeg**.
 
 ```
 modality: CT
diff --git a/auto3dseg/docs/run_with_minimal_input.md b/auto3dseg/docs/run_with_minimal_input.md
@@ -18,55 +18,39 @@ if os.path.exists(root):
     download_and_extract(resource, compressed_file, root)
 ```
 
-**Step 1.** Provide the following data list (a ".json" file) for a new task and the data root. The typical data list is shown as follows.
+**Step 1.** Provide a `datalist.json` file.
+See the documentation under the `load_decathlon_datalist` function in `monai.data.decathlon_datalist` for details on the file format.
 
+For the AutoRunner, you only need the `training` field with its list of training files:
 ```
 {
-    "training": [
-        {
-            "fold": 0,
-            "image": "image_001.nii.gz",
-            "label": "label_001.nii.gz"
-        },
-        {
-            "fold": 0,
-            "image": "image_002.nii.gz",
-            "label": "label_002.nii.gz"
-        },
-        {
-            "fold": 1,
-            "image": "image_003.nii.gz",
-            "label": "label_001.nii.gz"
-        },
-        {
-            "fold": 2,
-            "image": "image_004.nii.gz",
-            "label": "label_002.nii.gz"
-        },
-        {
-            "fold": 3,
-            "image": "image_005.nii.gz",
-            "label": "label_003.nii.gz"
-        },
-        {
-            "fold": 4,
-            "image": "image_006.nii.gz",
-            "label": "label_004.nii.gz"
-        }
-    ],
-    "testing": [
-        {
-            "image": "image_010.nii.gz"
-        }
-    ]
+    "training":
+        [
+            {"image": "/path/to/image_1.nii.gz", "label": "/path/to/label_1.nii.gz"},
+            {"image": "/path/to/image_2.nii.gz", "label": "/path/to/label_2.nii.gz"},
+            ...
+        ],
+    "testing":
+        [
+           "/path/to/test_image_1.nii.gz",
+           "/path/to/test_image_2.nii.gz",
+            ...
+        ]
 }
+
 ```
+In each training item, you can add a `fold` field (with an integer starting at 0) to pre-specify the cross-validation folds, otherwise the AutoRunner will generate its own folds (always 5). All trained algorithms will use the same generated or pre-specified folds, the file can be found in the `work_dir` folder that the AutoRunner generates.
+If you have a validation set, you can include it under a `validation` key with the same format as the `training` list. This will disable cross-validation.
+A "testing" list can also be added, which only requires the image files, not the labels. If it is included, the AutoRunner will output predictions on the testing set after training.
+It is recommended to add a `name` field and any other metadata fields that allow you to track which version of your dataset the models are trained on.
+
+Save the file to `./datalist.json`.
 
 **Step 2.** Prepare "task.yaml" with the necessary information as follows.
 
 ```
-modality: CT
-datalist: "./task.json"
+modality: CT  # or MRI
+datalist: "./datalist.json"
 dataroot: "/workspace/data/task"
 ```
 
diff --git a/auto3dseg/notebooks/auto_runner.ipynb b/auto3dseg/notebooks/auto_runner.ipynb
@@ -273,13 +273,9 @@
     "\n",
     "`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. \n",
     "\n",
-    "NOTE: \n",
-    "**Auto3DSeg** uses MONAI bundle templates to perform training, validation, and inference.\n",
-    "The number of epochs/iterations of training is specified by the config files in each template.\n",
-    "Users can override these these values in the bundle templates.\n",
-    "But users should consider that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n",
+    "As an example, see the code block below, which specifies e.g. the number of epochs used for training. Note that some algorithms may treat this as a maximum number of epochs.\n",
     "\n",
-    "For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n",
+    "NOTE: \n",
     "The setup works fine for a machine that has GPUs less than or equal to 8.\n",
     "The datalist in this example is only using a subset of the original dataset.\n",
     "Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n",
diff --git a/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb b/auto3dseg/notebooks/msd_crossval_datalist_generator.ipynb
@@ -19,7 +19,15 @@
     "See the License for the specific language governing permissions and  \n",
     "limitations under the License. \n",
     "\n",
-    "# Datalist Generator"
+    "# Datalist Cross-Validation Folds Generator"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook contains an example to add cross-validation folds to an existing Medical Segmentation Decathlon datalist, in this case the one of Task09_Spleen. \n",
+    "When running repeated experiments, it can be beneficial to create cross-validation folds beforehand."
    ]
   },
   {