SINTEF · nepstad · Sep 6, 2025 · Jun 11, 2025 · Jun 11, 2025 · Jun 12, 2025
diff --git a/README.md b/README.md
@@ -6,10 +6,15 @@ A Python Ocean Particle Image Analysis toolbox
 # Quick tryout of PyOPIA
 
 1) Install [uv](https://docs.astral.sh/uv/getting-started/installation)
-2) Run PyOPIA classification tests on database particles
+2) Initialize PyOPIA project with a small example image dataset and run processing
 ```bash
-uv run --python 3.12 --with git+https://github.com/SINTEF/pyopia --with tensorflow==2.16.2 --with keras==3.5.0 python -m pyopia.tests.test_classify
+uvx --python 3.12 --from pyopia[classification] pyopia --init-project pyopiatest --example-data
+cd pyopiatest
+uvx --python 3.12 --from pyopia[classification] pyopia process config.toml
 ```
+3) Inspect the processed particle statistics in the processed/ folder
+
+See the documentation for more information on how to install and use PyOPIA.
 
 # Documentation:
 

diff --git a/docs/intro.md b/docs/intro.md
@@ -44,7 +44,18 @@ cd mypyopiaproject
 uv add pyopia[classification]
 ```
 
-To run PyOPIA, either use uv (uv run pyopia --help), or activate the venv first (source .venv/bin/activate), before running pyopia (pyopia --help).
+To run PyOPIA, either use uv
+```
+uv run pyopia --help
+```
+
+or activate the venv before running PyOPIA (without uv)
+
+```
+source .venv/bin/activate
+pyopia --help
+```
+Note that the activation command differs between operating systems.
 
 The [classification] part installs tensorflow which is required by PyOPIA's Classification module, and is optional. 
 

diff --git a/docs/notebooks/cli.ipynb b/docs/notebooks/cli.ipynb
@@ -160,7 +160,7 @@
     "```\n",
     "\n",
     "Will put an example config.toml file in the current directory.\n",
-    "Some elements of the pipelines are instrument specific, so either `silcam` or `holo` must be specified. In future, we will add a generic pipline that uses a an imread function that can load most image types - for the moement, though you will need to setup your own pipeline config if you are not using a silcam or holo pipeline.\n",
+    "Some elements of the pipelines are instrument specific, so either `silcam` or `holo` must be specified. In future, we will add a generic pipline that uses a an imread function that can load most image types - for the moment though, you will need to setup your own pipeline config if you are not using a silcam or holo pipeline.\n",
     "\n",
     "Generate a config for a silcam pipeline:\n",
     "```\n",
@@ -423,7 +423,7 @@
     "\n",
     "See {func}`pyopia.cli.process` for more details\n",
     "\n",
-    "Please have a look at the page about analysing {ref}(big-data) if you have a lot of images and/or a lot of particles."
+    "Please have a look at the page about analysing {ref}`big-data` if you have a lot of images and/or a lot of particles."
    ]
   },
   {

diff --git a/docs/notebooks/processing_raw_data.ipynb b/docs/notebooks/processing_raw_data.ipynb
@@ -29,16 +29,28 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 1) Get yourself a config file.\n",
-    "You can do this either by copy-paste from the page on {ref}`toml-config` into a new toml file (you might call it 'silcam-config.toml', for example), or from generating a very basic config from the command line tool: `pyopia generate-config`, e.g. for silcam:\n",
+    "## 1) Create a new project folder with a config file and metadata template\n",
+    "To start a new image processing project with PyOPIA, you can use the 'init-project' command (here called 'myproject'):\n",
     "\n",
     "```\n",
-    "pyopia generate-config silcam 'rawdatapath/*.silc' 'modelpath/keras_model.keras' 'proc_folder_path' 'testdata'\n",
+    "pyopia init-project myproject\n",
     "```\n",
     "\n",
-    "If you want help on what these options are, do: `pyopia generate-config --help`\n",
+    "If you want help and additional options for this command, do: `pyopia init-project --help`\n",
     "\n",
-    "You should now have a toml file (e.g. called 'silcam-config.toml')"
+    "You should now have a new project folder ('myproject') contaning a config file ('config.toml') and a README file with suggestions for steps to perform before starting processing. Several other input files and subfolders are also generated:\n",
+    "\n",
+    "```\n",
+    "myproject/\n",
+    "├── auxillarydata\n",
+    "│   └── auxillary_data.csv\n",
+    "├── config.toml\n",
+    "├── images\n",
+    "├── metadata.json\n",
+    "├── processed\n",
+    "├── pyopia-default-classifier-20250409.keras\n",
+    "└── README\n",
+    "```"
    ]
   },
   {
@@ -51,19 +63,42 @@
     "\n",
     "If you need detailed help on arguments specific to a pipeline class, then you may wish to refer to the API documentation for that specific class.\n",
     "\n",
-    "If you want to do classification, you need to give the `model_path` argument within `[steps.classifier]` a path to a trained keras model. You can download a silcam example [here](https://pysilcam.blob.core.windows.net/test-data/silcam-classification_database_20240822-200-20240829T091048.zip)"
+    "Particle classification is provided by [steps.classifier], which points to a pre-trained Keras CNN model. A default classifier for PyOPIA was provided by default using the init-project command. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3) Add project-relevant metadata\n",
+    "\n",
+    "PyOPIA generates a self-describing netCDF file during processing, which in addition to particle statistics contain some basic metadata. These are in part taken from the 'metadata.json' file generated in the previous step.\n",
+    "\n",
+    "The generated template file 'metadata.json' contains several items that should be filled out, such as 'title' and 'creator_name'. Also check that you are happy with the default license proposed (CC BY-SA). \n",
+    "\n",
+    "\n",
+    "You can add your own metadata items in this file as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4) Add auxillary data\n",
+    "\n",
+    "A typical image dataset will be associated with some auxillary data variables, e.g. temperature, salinity and depth for a profiling setup deployed at sea. This information can optionally be incorporated into the particle statistics netCDF that PyOPIA generates, to ease post-processing of the data. Such information should be added as time series in the auxillary data file ('auxillary_data.csv'). Each row in this file should consist of a time stamp and one or more auxillary data elements. The time stamps are interpolated to match each image being processed, so they need not match exactly, but should cover the same time period. See the generated template file for more information ('auxillarydata/auxillary_data.csv').\n"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 3) Process!\n",
+    "## 5) Process!\n",
     "\n",
     "Run the command line processing which simply needs to know which config file you want it to work on, e.g.:\n",
     "\n",
     "```\n",
-    "pyopia process silcam-config.toml\n",
+    "pyopia process config.toml\n",
     "```"
    ]
   },

diff --git a/pyopia/__init__.py b/pyopia/__init__.py
@@ -1 +1 @@
-__version__ = "2.10.0"
+__version__ = "2.11.0"
diff --git a/pyopia/auxillarydata.py b/pyopia/auxillarydata.py
@@ -0,0 +1,106 @@
+import pandas as pd
+import logging
+import xarray as xr
+
+logger = logging.getLogger()
+
+AUXILLARY_DATA_FILE_TEMPLATE = """% COMMENT LINE: PLEASE UPDATE THIS FILE WITH PROJECT RELEVANT DATA. EACH COLUMN WILL BECOME A NETCDF VARIABLE.
+% COMMENT LINE: ONE LINE PER MEASUREMENT, TIME IS INTERPOLATED TO IMAGE DATA TIMES IN PYOPIA. FOLLOWING LINES ARE UNITS, DESCRIPTION AND VARIABLE NAME.
+,metres,degC
+Time of measurement,Depth at measurement location,Temperature at measurement location
+time,depth,temperature
+2022-06-08T18:40:00.00000,0.0,5.0
+2022-06-08T18:41:00.00000,5.0,6.0
+2022-06-08T18:42:00.00000,10.0,7.0
+2022-06-08T18:43:00.00000,20.0,8.0
+"""  # noqa: E501
+
+
+class AuxillaryData:
+    """
+    Handle auxillary data for PyOPIA particle statistics file.
+
+    Auxillary data variables may include (image) depth, longitude, latitude, etc.
+    This class parses a well-defined defined .csv input format, see example below.
+
+    Parameters
+    ----------
+    auxillary_data_path : str
+        Path to auxillary data files .csv creates by the enduser
+
+
+    Example of auxillary data file
+    ------------------------------
+    % COMMENT LINE:
+    % COMMENT LINE:
+    ,m,degC,psu
+    Time of measurement,Depth at measurement location,Temperature at measurement location
+    time,depth,temperature,salinity
+    2025-03-19T16:59:29.950729,0.0,5.0,34
+    2025-03-19T17:59:29.950729,0.0,5.0,34
+
+
+    Note
+    ----
+    Each column will become a netCDF variable
+    The two first lines are comments, and are ignored
+    Note that the third and fourth rows contain units and description for each variable
+    """
+
+    def __init__(self, auxillary_data_path=None):
+        self.auxillary_data_path = auxillary_data_path
+
+        # Create empty dataframe for cases where no file was specified, or an error occured reading it
+        self.auxillary_data = pd.DataFrame(index=pd.Index([], name="time")).to_xarray()
+        if auxillary_data_path is not None:
+            try:
+                self.auxillary_data = self.load_auxillary_data(auxillary_data_path)
+            except RuntimeError as e:
+                print(f"Failed to load auxillary data from file: {self.auxillary_data}")
+                logging.error(
+                    f"Failed to load auxillary data from file: {self.auxillary_data}"
+                )
+                logging.error(e)
+
+    def load_auxillary_data(self, auxillary_data_path):
+        """Load and format uxillary data from .csv file"""
+
+        # Load in the auxillary data file
+        auxillary_data = pd.read_csv(auxillary_data_path, skiprows=4)
+
+        # Load units and description rows
+        units = pd.read_csv(auxillary_data_path, skiprows=2, nrows=0).columns
+        long_names = pd.read_csv(auxillary_data_path, skiprows=3, nrows=0).columns
+
+        # Set time as the index and make sure its type is datetime64[ns]
+        auxillary_data["time"] = auxillary_data["time"].astype("datetime64[ns]")
+        auxillary_data = auxillary_data.set_index("time")
+
+        # Transform into xarray, add units
+        auxillary_data = auxillary_data.to_xarray()
+        for i, col in enumerate(auxillary_data.data_vars):  # Iterate over each column
+            auxillary_data[col].attrs["units"] = units[i + 1]
+            auxillary_data[col].attrs["long_name"] = long_names[i + 1]
+
+        logging.info(auxillary_data)
+
+        return auxillary_data
+
+    def add_auxillary_data_to_xstats(self, xstats):
+        """Add auxillary data to a PyOPIA xstats object"""
+        logging.info("Adding auxillary data to xstats and storing to new file")
+
+        # Add each auxillary data variable to xstats, interpolated to xstats times
+        for (
+            data_var
+        ) in self.auxillary_data.data_vars:  # Iterate over each data variable
+            xstats[data_var] = xr.DataArray(
+                data=self.auxillary_data[data_var]
+                .astype(float)
+                .interp(time=xstats["timestamp"]),
+                dims=xstats.dims,
+                coords=xstats.coords,
+                attrs=self.auxillary_data[data_var].attrs,
+            )
+
+        return xstats
diff --git a/pyopia/cf_metadata.json b/pyopia/cf_metadata.json
@@ -1,49 +1,49 @@
 {"major_axis_length": {
     "standard_name": "major_axis_length",
     "long_name": "The length of the major axis of the ellipse that has the same normalized second central moments as the region",
-    "units": "micrometer",
+    "units": "Pixels",
     "calculation_method": "Computed using skimage.measure.regionprops (axis_major_length)",
     "pyopia_process_level": 1},
 "minor_axis_length": {
 "standard_name": "minor_axis_length",
     "long_name": "The length of the minor axis of the ellipse that has the same normalized second central moments as the region",
-    "units": "micrometer",
+    "units": "Pixels",
     "calculation_method": "Computed using skimage.measure.regionprops (axis_minor_length)",
     "pyopia_process_level": 1},
 "equivalent_diameter": {
     "standard_name": "equivalent_circular_diameter",
     "long_name": "Diameter of a circle with the same area as the particle",
-    "units": "micrometer",
+    "units": "Pixels",
     "calculation_method": "Computed using skimage.measure.regionprops (equivalent_diameter)",
     "pyopia_process_level": 1},
 "minr": {
     "standard_name": "minimum_row_index",
     "long_name": "Minimum row index of the particle bounding box",
-    "units": "pixels",
+    "units": "Pixels",
     "calculation_method": "Extracted from skimage.measure.regionprops (bbox[0])",
     "pyopia_process_level": 1},
 "maxr": {
     "standard_name": "maximum_row_index",
     "long_name": "Maximum row index of the particle bounding box",
-    "units": "pixels",
+    "units": "Pixels",
     "calculation_method": "Extracted from skimage.measure.regionprops (bbox[2])",
     "pyopia_process_level": 1},
 "minc": {
     "standard_name": "minimum_column_index",
     "long_name": "Minimum column index of the particle bounding box",
-    "units": "pixels",
+    "units": "Pixels",
     "calculation_method": "Extracted from skimage.measure.regionprops (bbox[1])",
     "pyopia_process_level": 1},
 "maxc": {
     "standard_name": "maximum_column_index",
     "long_name": "Maximum column index of the particle bounding box",
-    "units": "pixels",
+    "units": "Pixels",
     "calculation_method": "Extracted from skimage.measure.regionprops (bbox[3])",
     "pyopia_process_level": 1},
 "saturation": {
     "standard_name": "image_saturation",
     "long_name": "Percentage saturation of the image",
-    "units": "percent",
+    "units": "Percent",
     "calculation_method": "Computed as the percentage of the image covered by particles relative to the maximum acceptable coverage",
     "pyopia_process_level": 1},
 "index": {
@@ -58,11 +58,6 @@
     "units": "",
     "calculation_method": "Generated during particle export",
     "pyopia_process_level": 1},
-"time": {
-    "standard_name": "time",
-    "long_name": "Time of particle observation",
-    "calculation_method": "Extracted from the timestamp of the observation",
-    "pyopia_process_level": 0},
 "timestamp": {
     "standard_name": "timestamp",
     "long_name": "Timestamp of particle observation",

diff --git a/pyopia/classify.py b/pyopia/classify.py
@@ -3,6 +3,7 @@
 """
 
 import os
+import hashlib
 import numpy as np
 import pandas as pd
 import logging
@@ -125,6 +126,11 @@ def load_model(self):
         path, filename = os.path.split(model_path)
         self.model = keras.models.load_model(model_path)
 
+        # Create a hash of the model weights file
+        with open(model_path, "rb") as f:
+            digest = hashlib.file_digest(f, "sha256")
+        self.model_hash = digest.hexdigest()
+
         # Try to create model output class name list from last model layer name
         class_labels = None
         try: