Skip to content

Add section 'requirements for processing modules' #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions scenario-databases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,18 +75,20 @@ Scenario processing
-------------------

When submitting a scenario (a.k.a. "run") to an IIASA database instance, the server
executes a scenario-processing workflow including *region-aggregation* and
*scenario validation* prior to saving the scenario to the database. The processing uses
the **nomenclature** package (`read the docs <https://nomenclature-iamc.readthedocs.io>`_).
executes a scenario-processing workflow including *region-aggregation* and *scenario
validation* prior to saving the scenario to the database. The processing uses the
**nomenclature** package (`read the docs <https://nomenclature-iamc.readthedocs.io>`_).

The region-aggregation and validation is configured via a project-specific GitHub_
repository, usually named `https://github.com/iiasa/<project>-workflow`_. Please contact
the respective project managers or the Scenario Services team if you need access.

You can also run the project workflow locally (on your computer) before submission to
an IIASA database instance, to make sure that the validation and processing works.
See :ref:`local-processing` for more information!
The workflow for processing files uploaded via the IIASA Scenario Explorer is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the paragraph on local-processing (including the link) and add the new paragraph below.

implemented in a modular fashion. It is possible to execute programs, code and tools
developed by (non-IIASA) research partners as part of the processing workflow
if the tool follows the :ref:`processing-requirements`.

.. _GitHub: https://www.github.com

.. _`https://github.com/iiasa/<project>-workflow`: https://github.com/iiasa

3 changes: 2 additions & 1 deletion user-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@ Detailed User Guides
--------------------

.. toctree::
:maxdepth: 2
:maxdepth: 1

user-guide/local-processing
user-guide/processing-requirements

.. _common-definitions: https://github.com/iamconsortium/common-definitions
54 changes: 54 additions & 0 deletions user-guide/processing-requirements.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
.. _processing-requirements:

Requirements for processing modules
===================================

Any module (a.k.a. program, code or tool) must adhere to the following standards of
best-practice software development. The aim of these guidelines is to ensure reliability
of our services, minimize maintenance requirements, and guarantee reproducibility of
results across platforms.

General requirements
--------------------

- The program, code or tool must be implemented in Python (≥3.10) or R; compiled
executables are not acceptable for security reasons
- Distribution of the source code
- via an online version-controlled repository
(preferably GitHub) to which the IIASA admin team has access; or
- installation via a package manager (pip, conda, CRAN).
- The program must run on Debian (preferably Ubuntu)
- The dependencies must be clearly stated,
e.g. as Dockerfile (describing execution environment, library dependencies etc.)
Python package dependencies according to packaging user guide (e.g. as environment.yml, requirements.txt etc.)
R dependencies
- The license must be clearly stated.
- The documentation of the program, code or tool must include:
- Purpose of the program and individual top-level functions
- Instructions how to run the program
- Expected input (variables, region mappings) and standard output
- Explanation of any settings and optional parameters

Application programming interface
---------------------------------

**Option 1**:

The module is called via a command-line interface (CLI)
and take the following arguments:

- :code:`input`: path to an IAMC-formatted file (:code:`xlsx` or :code:`csv`)
- :code:`output`: path where to write an output file
(usually derived timeseries data) in the same format
- Any relevant settings and optional parameters must also be specified
via the CLI

e.g. :code:`"python process.py --input path-to-input-file.xlsx --output path-to-output-file.xlsx"`

**Option 2** (applicable for packages/functions written in Python):

Importable Python functions that take and return :class:`pandas.DataFrame` (with columns
folllowing the IAMC format) or :class:`pyam.IamDataFrame` objects can be called as part
of the processing workflow. Any settings or optional parameters must be given as keyword
arguments to the top-level function, preferably with the option to set them via a
settings or configuration file.