Add section 'requirements for processing modules'

phackstock · phackstock · commit b40b4bbf711c · 2025-01-23T17:12:21.000+01:00
diff --git a/scenario-databases.rst b/scenario-databases.rst
@@ -81,6 +81,66 @@ The region-aggregation and validation is configured via a project-specific GitHu
 repository, usually named `https://github.com/iiasa/<project>-workflow`_. Please contact
 the respective project managers or the Scenario Services team if you need access.
 
+The workflow for processing files uploaded via the IIASA Scenario Explorer
+is implemented in a modular fashion.
+This makes it straightforward to execute programs, code and tools developed
+by (non-IIASA) research partners as part of the processing workflow.
+
+Requirements for processing modules
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Any module (a.k.a. program, code or tool) must adhere
+to the following standards of best-practice software development.
+The aim of these guidelines is to ensure reliability of our services,
+minimize maintenance requirements, and guarantee reproducibility of results
+across platforms.
+
+General requirements
+````````````````````
+
+- The program, code or tool must be implemented in Python (≥3.7) or R;
+  compiled executables are not acceptable for security reasons
+- Distribution of the source code
+  - via an online version-controlled repository
+  (preferably GitHub) to which the IIASA admin team has access; or
+  - installation via a package manager (pip, conda, CRAN).
+- The program must run on Debian (preferably Ubuntu)
+- The dependencies must be clearly stated,
+  e.g. as Dockerfile (describing execution environment, library dependencies etc.)
+  Python package dependencies according to packaging user guide (e.g. as environment.yml, requirements.txt etc.)
+  R dependencies
+- The license must be clearly stated.
+- The documentation of the program, code or tool must include:
+  - Purpose of the program and individual top-level functions
+  - Instructions how to run the program
+  - Expected input (variables, region mappings) and standard output
+  - Explanation of any settings and optional parameters
+
+Application programming interface
+`````````````````````````````````
+
+**Option 1**:
+
+The module is called via a command-line interface (CLI)
+and take the following arguments:
+
+- :code:`input`: path to an IAMC-formatted file (:code:`xlsx` or :code:`csv`)
+- :code:`output`: path where to write an output file
+  (usually derived timeseries data) in the same format
+- Any relevant settings and optional parameters must also be specified
+  via the CLI
+
+e.g. :code:`"python process.py --input path-to-input-file.xlsx --output path-to-output-file.xlsx"`
+
+**Option 2** (applicable for packages/functions written in Python):
+
+Importable Python functions that take and return :class:`pandas.DataFrame` (with columns
+folllowing the IAMC format) or :class:`pyam.IamDataFrame` objects can be called as part
+of the processing workflow. Any settings or optional parameters must be given as keyword
+arguments to the top-level function, preferably with the option to set them via a
+settings or configuration file.
+
+
 Executing scenario processing locally
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -115,4 +175,4 @@ Read the `User Guide`_ of the **nomenclature** package for more information!
 
 .. _`https://github.com/iiasa/<project>-workflow`: https://github.com/iiasa
 
-.. _`User Guide`: https://nomenclature-iamc.readthedocs.io/en/stable/user_guide/local-usage.html
+.. _`User Guide`: https://nomenclature-iamc.readthedocs.io/en/stable/user_guide/local-usage.html