From 90f50ff1cb1f1a4ebea7954efdf67ccd3820b0c9 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 09:56:13 +0100 Subject: [PATCH 1/9] Move distrib.rst to new distrib/ folder --- doc/{ => distrib}/distrib.rst | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename doc/{ => distrib}/distrib.rst (100%) diff --git a/doc/distrib.rst b/doc/distrib/distrib.rst similarity index 100% rename from doc/distrib.rst rename to doc/distrib/distrib.rst From 7dad4152c692cad88232c145899275ab22fae1b2 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 10:08:33 +0100 Subject: [PATCH 2/9] Rename distrib/distrib.rst to distrib/overview.rst --- doc/distrib/{distrib.rst => overview.rst} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename doc/distrib/{distrib.rst => overview.rst} (100%) diff --git a/doc/distrib/distrib.rst b/doc/distrib/overview.rst similarity index 100% rename from doc/distrib/distrib.rst rename to doc/distrib/overview.rst From 21a267ad27e8dad37210da9d0ae038faa3b6d99d Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 10:35:36 +0100 Subject: [PATCH 3/9] Change title --- doc/distrib/overview.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/distrib/overview.rst b/doc/distrib/overview.rst index 32a154e7ca..8a23de623b 100644 --- a/doc/distrib/overview.rst +++ b/doc/distrib/overview.rst @@ -1,5 +1,5 @@ -Distributed computing -********************* +Overview and tooling +******************** This page introduces considerations, tools, and features for using **distributed** or **high-throughput computing** with MESSAGEix-GLOBIOM. From d6db4b39d18d23a4ddeffc1fb819a36e861805f1 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 10:36:03 +0100 Subject: [PATCH 4/9] Update index.rst to reflect new distributed computing location --- doc/index.rst | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/doc/index.rst b/doc/index.rst index 099326f28e..5a2343a860 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -22,7 +22,6 @@ Among other tasks, the tools allow modelers to: cli quickstart repro - distrib bibliography API reference @@ -105,6 +104,12 @@ Commonly used classes may be imported directly from :mod:`message_ix_models`. pkg-data/codelists pkg-data/iiasa-se +.. toctree:: + :maxdepth: 2 + :caption: Distributed computing + + distrib/overview + .. toctree:: :maxdepth: 2 :caption: Development From 6b1952f0fb41401952ed17cb1f4333915081e792 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 13:14:39 +0100 Subject: [PATCH 5/9] Add first draft of UniCC guide --- doc/distrib/unicc.rst | 813 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 813 insertions(+) create mode 100644 doc/distrib/unicc.rst diff --git a/doc/distrib/unicc.rst b/doc/distrib/unicc.rst new file mode 100644 index 0000000000..b6c64fd78d --- /dev/null +++ b/doc/distrib/unicc.rst @@ -0,0 +1,813 @@ +UniCC User Guide +**************** + +This is a guide on how to get set up on the Unified IIASA Computing Cluster (UnICC) and how to run MESSAGEix scenarios on the cluster. +Please note that this guide is only for IIASA staff and collaborators who have access to the UnICC. + +.. contents:: + :local: + :backlinks: none + +Prerequisites and Good-to-Knows +=============================== + +Access to the UnICC +------------------- + +To access the UnICC, an IIASA account is required. With your IIASA account, create a ticket with Information and Communication Technologies (ICT) to request access to the UnICC. +The intranet page on the UnICC can be found [here](https://iiasahub.sharepoint.com/sites/ict/SitePages/Scientific-Computing.aspx). +On the intranet webpage, the Slurm User Guide file has a section on how to request access to the UnICC, including what information needs to be provided to ICT in your request: +1. Are there any existing shared projects folder inside the cluster that you need access to? +2. Do you need a new shared project folder inside the cluster? In this case, please specify the +project name (default size 1 TB), also the name of the users who need access to the folder. +3. Please note that existing home folders will be automatically attached. +4. Please describe which already existing P: drive folder(s) you need access to from inside the +cluster. +5. Please note, a 5GB home folder will be automatically created for you in the cluster. + +Storage Space +~~~~~~~~~~~~~ + +When requesting access to UnICC, 5GB of space on your home directory will likely be given by default. While setting up the MESSAGE environment, it is easy to hit the maximum (the repositories like `message_data` are big and GAMS installation itself is almost 2GB on its own). So, request more space upfront or ask for an increase later (it is possible to request 50GB of storage space, and increase that even further later). + +Network Drives Access on the Cluster +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As part of the questionnaire above for the ticket, specify which P: drive folders you need access to. Additionally, access to your H: drive on the cluster will be automatically granted. Every user's H: drive is located on the cluster in `/hdrive/all_users/[username]`. + +If a shared project folder was requested, it will be located in ``/projects/[project name]``. + +Using MESSAGE Environments on H: Drive vs Setting Up New MESSAGE Environments +----------------------------------------------------------------------------- + +This guide walks through the process of installing a MESSAGEix environment from source on the cluster (in your home directory). +Theoretically, because the H: drive can be accessed on the cluster, repositories and MESSAGEix environments could possibly be in your H: drive folder. +Then, potentially, just activate your MESSAGE environment(s) from the H: drive, saving the trouble of creating new MESSAGE environments. + +Note on Working in Terminal +--------------------------- + +The rest of this document assumes you're in a terminal window on the UniCC cluster and not in a notebook. + +Also, throughout this guide ```nano``` is used to edit files. If ```nano``` is not familiar, use ``vim`` or ``emacs``` or any other text editor you're comfortable with. + +Git-Related Setup +================= + +Generate SSH Key +---------------- + +This was needed to clone GitHub repositories. + +Follow instructions here: +https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent +and +https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account + +Run: + +.. code:: bash + + ssh-keygen -t ed25519 -C "you@email.com" # replace with your own keygen info and email + +Received prompt: + +.. code:: bash + + Generating public/private ed25519 key pair. + Enter file in which to save the key (/h/u142/username/.ssh/id_ed25519): + Enter passphrase (empty for no passphrase): + +(Save your passphrase somewhere safe.) + +Add SSH Key to SSH-Agent +------------------------ + +Start ssh-agent in the background: + +:: + + eval "$(ssh-agent -s)" + +Add SSH private key to ssh-agent: + +:: + + ssh-add ~/.ssh/id_ed25519 + +Add SSH Key to GitHub Account +----------------------------- + +Run: + +:: + + cat ~/.ssh/id_ed25519.pub + +Copy the content. + +On GitHub, go to Settings > SSH and GPG keys. + +Click on “New SSH key”. + +Name new SSH key and paste the key. + +Creating Personal Access Tokens +------------------------------- + +This was needed to clone message_data for some reason. + +Refer to +`https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens `__ +for instructions. + +In Settings > Developer settings > Personal access tokens > Fine-grained +tokens 1. Click “Tokens (classic)” 2. Select Generate new token > +Generate new token (classic) 3. Enter token name “IIASA UnICC” 4. Select +“No expiration”. + +Add Email and Username to Global Git Config +------------------------------------------- + +.. code:: bash + + git config --global user.email "you@email.com" # replace with your GitHub email + git config --global user.username "username" # replace with your GitHub username + git config --global user.name "Firstname Lastname" # replace with your name + +Auto Load Python and Java on Startup +------------------------------------ + +Add the following to ``~/.bash_profile`` (by entering +``nano ~/.bash_profile``): + +.. code:: bash + + module purge + module load Python/3.11.5-GCCcore-13.2.0 + module load Java + module load git-lfs + +This ensures that the correct Python version is loaded (and added to +$PATH) and that Java is loaded (and added to $PATH) each time the terminal is loaded. + +Create Virtual Environment +-------------------------- + +A lot of people on the team use ``conda`` but Python’s +``venv`` is used to create the virtual environment. + +*IMPORTANT NOTE* When initially trying to create a virtual environment +by just running ``python -m venv my_env``, it caused issues when trying +to activate the environment in a Slurm job. It works just fine +interactively on the node, but when using within a job, it would fail to +activate. The reason is because the default ``python`` command on the +interactive node creates an environment using the default Python +instance, inherited from Jupyter, which is not accessible from the +compute nodes where the Slurm job will run. So it’s necessary to create +an environment the following way: + +In the home directory (``~``), run the following to create and activate +the virtual environment (note that if the instructions +earlier to run ``module purge`` or ``module load`` in your +``~/.bash_profile`` were followed, these steps probably don’t have to be done again): + +.. code:: bash + + module purge + module load Python/3.11.5-GCCcore-13.2.0 + python3 -m venv env/env_name + source ~/env/env_name/bin/activate + +Install MESSAGEix Ecosystem by Source +===================================== + +Get ``message_ix`` Repository +----------------------------- + +Run: + +.. code:: bash + + git clone https://github.com/username/message_ix.git # replace with your own fork or the IIASA repo + cd message_ix + git remote add upstream https://github.com/iiasa/message_ix + git pull upstream main + git fetch --all --tags + +Install ``message_ix`` +---------------------- + +1. Navigate to the local ``message_ix`` repo root directory. + +2. Ensure you’re on the ``main`` branch: + + .. code:: bash + + git checkout main + +3. Ensure branch is up-to-date: + + .. code:: bash + + git pull upstream main + +4. Fetch the version tags: + + .. code:: bash + + git fetch --all --tags + +5. Install from source: + + .. code:: bash + + pip install --editable .[docs,reporting,tests,tutorial] + +6. Check ``message_ix`` is installed correctly: + + .. code:: bash + + message-ix show-versions + +Get ``ixmp`` Repository +----------------------- + +.. code:: bash + + git clone https://github.com/username/ixmp.git # replace with your own fork or the IIASA repo + cd ixmp + git remote add upstream https://github.com/iiasa/ixmp + git pull upstream main + git fetch --all --tags + +Install ``ixmp`` +---------------- + +1. Navigate to the local ``ixmp`` repo root directory. + +2. Ensure you’re on the ``main`` branch. + + .. code:: bash + + git checkout main + +3. Ensure branch is up-to-date: + + .. code:: bash + + git pull upstream main + +4. Fetch the version tags: + + .. code:: bash + + git fetch --all --tags + +5. Install from source: + + .. code:: bash + + pip install --editable .[docs,tests,tutorial] + +Get ``message-ix-models`` Repository +------------------------------------ + +.. code:: bash + + git clone https://github.com/username/message-ix-models.git # replace with your own fork or the IIASA repo + cd message-ix-models + git remote add upstream https://github.com/iiasa/message-ix-models + git fetch --all --tags + git pull upstream main + +Install ``message-ix-models`` +----------------------------- + +1. Navigate to the local ``message-ix-models`` root directory. + +2. Ensure you’re on the ``main`` branch: + + .. code:: bash + + git checkout main + +3. Ensure branch is up-to-date: + + .. code:: bash + + git pull upstream main + +4. Fetch the version tags: + + .. code:: bash + + git fetch --all --tags + +5. Install from source: + + .. code:: bash + + pip install --editable . + +Install ``git-lfs`` +------------------- + +UniCC already has ``git lfs`` installed on the system, but you may still need install large file storage for ``message_data`` +or ``message-ix-models``. Note that you may not have to, as perhaps you don't need to access the large files in these repositories for your work. +The benefit of not installing is that you don't end up using all the needed storage space. But if you do need access to those files, then follow the instructions below. +The same instructions can be followed from the root directory of ``message_data`` or ``message_ix_models``. + +Load ``git lfs`` (if included in your +``~/.bash_profile`` like written earlier, this line doesn’t have to be run): + +.. code:: bash + + module load git-lfs + +Then, within the root directory of ``message-ix-models`` or +``message_data`` run the following: + +.. code:: bash + + git lfs install + +Then fetch and pull the lfs files (this might take a while): + +.. code:: bash + + git lfs fetch --all + git lfs pull + +Get ``message_data`` Repository +------------------------------- + +.. code:: bash + + git clone git clone git@github.com:username/message_data.git # replace with your own fork or the IIASA repo + cd message_data + git remote add upstream https://github.com/iiasa/message_data + git fetch --all --tags + +Install ``message_data`` +------------------------ + +1. Navigate to the local ``message_data`` root directory. + +2. Ensure you're on the branch you want to be on: + + .. code:: bash + + git checkout branch # replace "branch" with the branch you want to be on + +3. Ensure branch is up-to-date: + + .. code:: bash + + git pull upstream branch + +4. Fetch the version tags: + + .. code:: bash + + git fetch --all --tags + +5. Install from source with all options: + + .. code:: zsh + + pip install --no-build-isolation --editable .[ci,dl,scgen,tests] + + If the above doesn’t work, remove the ``--no-build-isolation``: + + .. code:: zsh + + pip install --editable .[ci,dl,scgen,tests] + +Also grab lfs: + +.. code:: bash + + git lfs fetch --all + git lfs pull + +Install GAMS +------------ + +Go to the following website to get the download of GAMS: +https://www.gams.com/download/ + +Click on the Linux download link, and then when the download popup +window shows up, right click and copy the link instead. Use +the link to put in the terminal to download the file: + +.. code:: bash + + cd downloads + wget https://d37drm4t2jghv5.cloudfront.net/distributions/46.5.0/linux/linux_x64_64_sfx.exe + +The Linux installation instructions are here: +https://www.gams.com/46/docs/UG_UNIX_INSTALL.html + +Create a location/directory where GAMS will be installed and navigate +to it (in this case, it is in a folder called ``~/opt/gams``) + +.. code:: bash + + cd ~ + mkdir opt + cd opt/ + mkdir gams + cd gams/ + +Run the installation file by simply inputting the filename (complete +with path) into the command line: + +.. code:: bash + + ~/downloads/linux_x64_64_sfx.exe # replace with your own path + +However, a permissions error was received: + +.. code:: bash + + bash: /home/username/downloads/linux_x64_64_sfx.exe: Permission denied + +If so, run the following: + +.. code:: bash + + chmod 754 /home/username/downloads/linux_x64_64_sfx.exe # replace path with your own path to the .exe file + +Then try to run the executable file again: + +.. code:: bash + + ~/downloads/linux_x64_64_sfx.exe + +This should start the installation of GAMS and create a folder in +``~/opt/gams`` (or wherever GAMS is being installed) called +``gams46.5_linux_x64_64_sfx``. Navigate into this folder: + +.. code:: bash + + cd gams46.5_linux_x64_64_sfx + +When within the ``/home/username/opt/gams/gams46.5_linux_x64_64_sfx``, +run the ``gams`` command to see if it works (but at this moment the full path of the ``gams`` command has to be referenced, which is +``/home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams``): + +.. code:: bash + + → /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams + --- Job ? Start 06/11/24 14:18:48 46.5.0 a671108d LEX-LEG x86 64bit/Linux + *** + *** GAMS Base Module 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux + *** + *** GAMS Development Corporation + *** 2751 Prosperity Ave, Suite 210 + *** Fairfax, VA 22031, USA + *** +1 202-342-0180, +1 202-342-0181 fax + *** support@gams.com, www.gams.com + *** + *** GAMS Release : 46.5.0 a671108d LEX-LEG x86 64bit/Linux + *** Release Date : May 8, 2024 + *** To use this release, you must have a valid license file for + *** this platform with maintenance expiration date later than + *** Feb 17, 2024 + *** System Directory : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ + *** + *** License : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt + *** GAMS Demo, for EULA and demo limitations see G240131/0001CB-GEN + *** https://www.gams.com/latest/docs/UG%5FLicense.html + *** DC0000 00 + *** + *** Licensed platform : Generic platforms + *** The installed license is valid. + *** Evaluation expiration date (GAMS base module) : Jun 29, 2024 + *** Note: For solvers, other expiration dates may apply. + *** Status: Normal completion + --- Job ? Stop 06/11/24 14:18:48 elapsed 0:00:00.001 + +Based on the output, there already is a gamslice (located in +``~/opt/gams/gams46.5_linux_x64_64_sfx``), which the contents +can be checked: + +.. code:: bash + + → cat gamslice.txt + GAMS_Demo,_for_EULA_and_demo_limitations_see___G240131/0001CB-GEN + https://www.gams.com/latest/docs/UG%5FLicense.html_______________ + 1496554900_______________________________________________________ + 0801346905_______________________________________________________ + DC0000_______g_1_______________________________C_Eval____________ + +This seems to be a demo gamslice license, so rename it to +``gamslice_demo.txt`` so it can be replaced with a proper license. + +.. code:: bash + + mv gamslice.txt gamslice_demo.txt + +Copy one of the GAMS licenses in the ECE program folder and put it +into the H: drive in a folder called ``gams``. Within UniCC, the H: drive can +be accessed via: ``/hdrive/all_users/username/``. + +So, copy the GAMS license from the H: drive to the GAMS installation +location (the paths will be different depending on where the file is saved on your own H: drive): + +.. code:: bash + + cp /hdrive/all_users/username/gams/gamslice_wCPLEX_2024-12-20.txt /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ + +Then, within the ``/home/username/opt/gams/gams46.5_linux_x64_64_sfx/`` +folder, rename the ``gamslice_wCPLEX_2024-12-20.txt`` file to just +``gamslice.txt``: + +.. code:: bash + + mv gamslice_wCPLEX_2024-12-20.txt gamslice.txt + +Now, when the ``gams`` command is called, the output looks like this: + +.. code:: bash + + → /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams + --- Job ? Start 06/11/24 14:24:43 46.5.0 a671108d LEX-LEG x86 64bit/Linux + *** + *** GAMS Base Module 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux + *** + *** GAMS Development Corporation + *** 2751 Prosperity Ave, Suite 210 + *** Fairfax, VA 22031, USA + *** +1 202-342-0180, +1 202-342-0181 fax + *** support@gams.com, www.gams.com + *** + *** GAMS Release : 46.5.0 a671108d LEX-LEG x86 64bit/Linux + *** Release Date : May 8, 2024 + *** To use this release, you must have a valid license file for + *** this platform with maintenance expiration date later than + *** Feb 17, 2024 + *** System Directory : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ + *** + *** License : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt + *** Small MUD - 5 User License S230927|0002AP-GEN + *** IIASA, Information and Communication Technologies Dep. + *** DC216 01M5CODICLPTMB + *** License Admin: Melanie Weed-Wenighofer, wenighof@iiasa.ac.at + *** + *** Licensed platform : Generic platforms + *** The installed license is valid. + *** Maintenance expiration date (GAMS base module): Dec 20, 2024 + *** Note: For solvers, other expiration dates may apply. + *** Status: Normal completion + --- Job ? Stop 06/11/24 14:24:43 elapsed 0:00:00.000 + +I then add the GAMS path to my ``~/.bash_profile``: + +.. code:: bash + + # add GAMS to path + export PATH=$PATH:/home/username/opt/gams/gams46.5_linux_x64_64_sfx + +I also add the GAMS aliases: + +.. code:: bash + + # add GAMS to aliases + alias gams=/home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams + alias gamslib=/home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslib + +Now, running just ``gams`` anywhere in the terminal gives the following +output: + +.. code:: bash + + → gams + --- Job ? Start 06/11/24 15:14:28 46.5.0 a671108d LEX-LEG x86 64bit/Linux + *** + *** GAMS Base Module 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux + *** + *** GAMS Development Corporation + *** 2751 Prosperity Ave, Suite 210 + *** Fairfax, VA 22031, USA + *** +1 202-342-0180, +1 202-342-0181 fax + *** support@gams.com, www.gams.com + *** + *** GAMS Release : 46.5.0 a671108d LEX-LEG x86 64bit/Linux + *** Release Date : May 8, 2024 + *** To use this release, you must have a valid license file for + *** this platform with maintenance expiration date later than + *** Feb 17, 2024 + *** System Directory : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ + *** + *** License : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt + *** Small MUD - 5 User License S230927|0002AP-GEN + *** IIASA, Information and Communication Technologies Dep. + *** DC216 01M5CODICLPTMB + *** License Admin: Melanie Weed-Wenighofer, wenighof@iiasa.ac.at + *** + *** Licensed platform : Generic platforms + *** The installed license is valid. + *** Maintenance expiration date (GAMS base module): Dec 20, 2024 + *** Note: For solvers, other expiration dates may apply. + *** Status: Normal completion + --- Job ? Stop 06/11/24 15:14:28 elapsed 0:00:00.000 + +I can also test if GAMS is working properly by running +``gams trnsport``: + +.. code:: bash + + → gams trnsport + --- Job trnsport Start 06/11/24 15:15:00 46.5.0 a671108d LEX-LEG x86 64bit/Linux + --- Applying: + /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gmsprmun.txt + --- GAMS Parameters defined + Input /home/username/opt/gams/gams46.5_linux_x64_64_sfx/trnsport.gms + ScrDir /home/username/opt/gams/gams46.5_linux_x64_64_sfx/225a/ + SysDir /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ + Licensee: Small MUD - 5 User License S230927|0002AP-GEN + IIASA, Information and Communication Technologies Dep. DC216 + /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt + License Admin: Melanie Weed-Wenighofer, wenighof@iiasa.ac.at + The maintenance period of the license will expire on Dec 20, 2024 + Processor information: 2 socket(s), 128 core(s), and 256 thread(s) available + GAMS 46.5.0 Copyright (C) 1987-2024 GAMS Development. All rights reserved + --- Starting compilation + --- trnsport.gms(66) 3 Mb + --- Starting execution: elapsed 0:00:00.022 + --- trnsport.gms(43) 4 Mb + --- Generating LP model transport + --- trnsport.gms(64) 4 Mb + --- 6 rows 7 columns 19 non-zeroes + --- Range statistics (absolute non-zero finite values) + --- RHS [min, max] : [ 2.750E+02, 6.000E+02] - Zero values observed as well + --- Bound [min, max] : [ NA, NA] - Zero values observed as well + --- Matrix [min, max] : [ 1.260E-01, 1.000E+00] + --- Executing CPLEX (Solvelink=2): elapsed 0:00:00.053 + + IBM ILOG CPLEX 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux + + --- GAMS/CPLEX Link licensed for continuous and discrete problems. + --- GMO setup time: 0.00s + --- GMO memory 0.50 Mb (peak 0.50 Mb) + --- Dictionary memory 0.00 Mb + --- Cplex 22.1.1.0 link memory 0.00 Mb (peak 0.00 Mb) + --- Starting Cplex + + Version identifier: 22.1.1.0 | 2022-11-28 | 9160aff4d + CPXPARAM_Advance 0 + CPXPARAM_Simplex_Display 2 + CPXPARAM_MIP_Display 4 + CPXPARAM_MIP_Pool_Capacity 0 + CPXPARAM_MIP_Tolerances_AbsMIPGap 0 + Tried aggregator 1 time. + LP Presolve eliminated 0 rows and 1 columns. + Reduced LP has 5 rows, 6 columns, and 12 nonzeros. + Presolve time = 0.00 sec. (0.00 ticks) + + Iteration Dual Objective In Variable Out Variable + 1 73.125000 x(seattle,new-york) demand(new-york) slack + 2 119.025000 x(seattle,chicago) demand(chicago) slack + 3 153.675000 x(san-diego,topeka) demand(topeka) slack + 4 153.675000 x(san-diego,new-york) supply(seattle) slack + + --- LP status (1): optimal. + --- Cplex Time: 0.00sec (det. 0.01 ticks) + + + Optimal solution found + Objective: 153.675000 + + --- Reading solution for model transport + --- Executing after solve: elapsed 0:00:00.482 + --- trnsport.gms(66) 4 Mb + *** Status: Normal completion + --- Job trnsport.gms Stop 06/11/24 15:15:01 elapsed 0:00:00.483 + +Set Up ``ixmp_dev`` +------------------- + +If you are a MESSAGEix developer with access to the `ixmp_dev` database, set up your access to the `ixmp_dev` database. + +Running MESSAGEix on the cluster +================================ + +Example script +-------------- +Here is a simple Python script to simply grab, clone, and solve a MESSAGE. +Create it by calling `nano ~/jobs/message/solve.py`, then pasting the following: + +.. code:: python + + import message_ix + + # select scenario + model_orig = "model" # replace with name of real model + scen_orig = "scenario" # replace with name of real scenario + + # target scenario + model_tgt = "unicc_test" + scen_tgt = scen_orig + "_cloned" + comment = "Cloned " + model_orig + "/" + scen_orig + + # load scenario + print("Loading scenario...") + s, mp = message_ix.Scenario.from_url("ixmp://ixmp_dev/" + model_orig + "/" + scen_orig) + + # clone scenario + print("Cloning scenario...") + s_new = s.clone(model_tgt, scen_tgt, comment, keep_solution=False) + + # solve the cloned scenario + print("Solving scenario...") + s_new.set_as_default() + s_new.solve( + "MESSAGE", + ) + + # close db + print("Closing database...") + mp.close_db() + + +Submitting Jobs +--------------- + +To submit a job, create a new file called ``job.do``, but it doesn’t +have to be called that and it can have any file extension. For example, +it can be called ``submit.job`` or even ``hi.jpeg``, and those would all +work. So, run: + +.. code:: bash + + nano job.do + +In the editor, write/paste: + +.. code:: bash + + #!/bin/bash + #SBATCH --time=3:00:00 + #SBATCH --mem=40G + #SBATCH --mail-type=BEGIN,END,FAIL + #SBATCH --mail-user=username@iiasa.ac.at + #SBATCH -o ~/out/solve_%J.out + #SBATCH -e ~/err/solve_%J.err + + module purge + source /opt/apps/lmod/8.7/init/bash + module load Python/3.11.5-GCCcore-13.2.0 + module load Java + + echo "Activating environment..." + source ~/env/env-name/bin/activate + + echo "Running python script..." + python ~/jobs/message/solve.py + +This script requests the following: + +* 3 hours of time +* 40 GB of memory +* Send an email when the job begins and ends (or fails) +* Send email to the address provided +* Save the outputs of the job (not the solved scenario, just any print statements in the Python script or anything like that) in ``/home/username/out/message/``, and the file would be called ``solve_%J.out`` where the “%J” is the job number +* Same as above, but saves the errors in an ``err`` folder. This is helpful when the script outputs a lot of warnings or errors and now there is a separate file for errors/warnings and a separate file for just the output. + +You can choose to forego saving the outputs and errors to files, but it is helpful to have them saved somewhere in case you need to refer back to them or to see what happened during the job. +If using the exact same script as above, you will have to manually create the ``out`` and ``err`` folders in the home directory first, if they don't already exist. +You can do this by running: + +.. code:: bash + + mkdir ~/out + mkdir ~/err + +It is important (I think) to load the Python and Java modules. I’m not +sure why the ``source /opt/apps/lmod/8.7/init/bash`` line is there, but +ICT included that in an email to me when I was asking for help. + +NOTE ON MEMORY: If this is not specified, the default amount of memory +that gets assigned to the job is 2GB. I think more +CPUs per job could also be requested instead, which would also give more memory (2 GB times the +number of CPUs). But instead, just request more memory. This +is IMPORTANT because a job with 8GB of memory failed during +the reporting because it requires a lot of memory. + +To submit the job, run the following (assuming you are in the folder +where ``job.do`` is located): + +.. code:: bash + + sbatch job.do + +The ``sbatch`` command is what submits the job, and whatever argument +that comes after it is your job file. + +To check the status of the job(s) by the user: + +.. code:: bash + + squeue -u username From 1f3a463f03599a7715666704519f883326081b5d Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 13:15:09 +0100 Subject: [PATCH 6/9] Add UniCC guide to index.rst --- doc/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/index.rst b/doc/index.rst index 5a2343a860..b3cc332f50 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -109,6 +109,7 @@ Commonly used classes may be imported directly from :mod:`message_ix_models`. :caption: Distributed computing distrib/overview + distrib/unicc .. toctree:: :maxdepth: 2 From 35c9260d34e4f562b4c58fcf3d64186975313de9 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Thu, 16 Jan 2025 13:16:23 +0100 Subject: [PATCH 7/9] Change title --- doc/distrib/unicc.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/distrib/unicc.rst b/doc/distrib/unicc.rst index b6c64fd78d..39551ac2b5 100644 --- a/doc/distrib/unicc.rst +++ b/doc/distrib/unicc.rst @@ -1,5 +1,5 @@ -UniCC User Guide -**************** +Running MESSAGEix on UniCC +************************** This is a guide on how to get set up on the Unified IIASA Computing Cluster (UnICC) and how to run MESSAGEix scenarios on the cluster. Please note that this guide is only for IIASA staff and collaborators who have access to the UnICC. From c65db28660c31bd3539a534e606feac90f7c37d6 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Wed, 22 Jan 2025 07:06:47 +0100 Subject: [PATCH 8/9] Add information on checking jobs statuses --- doc/distrib/unicc.rst | 203 ++++++++++++++++++++++++++++++++---------- 1 file changed, 156 insertions(+), 47 deletions(-) diff --git a/doc/distrib/unicc.rst b/doc/distrib/unicc.rst index 39551ac2b5..e8e0252ae1 100644 --- a/doc/distrib/unicc.rst +++ b/doc/distrib/unicc.rst @@ -697,39 +697,39 @@ Running MESSAGEix on the cluster Example script -------------- Here is a simple Python script to simply grab, clone, and solve a MESSAGE. -Create it by calling `nano ~/jobs/message/solve.py`, then pasting the following: +Create it by calling `nano ~/job/message/solve.py`, then pasting the following: .. code:: python - import message_ix + import message_ix - # select scenario - model_orig = "model" # replace with name of real model - scen_orig = "scenario" # replace with name of real scenario + # select scenario + model_orig = "model" # replace with name of real model + scen_orig = "scenario" # replace with name of real scenario - # target scenario - model_tgt = "unicc_test" - scen_tgt = scen_orig + "_cloned" - comment = "Cloned " + model_orig + "/" + scen_orig + # target scenario + model_tgt = "unicc_test" + scen_tgt = scen_orig + "_cloned" + comment = "Cloned " + model_orig + "/" + scen_orig - # load scenario - print("Loading scenario...") - s, mp = message_ix.Scenario.from_url("ixmp://ixmp_dev/" + model_orig + "/" + scen_orig) + # load scenario + print("Loading scenario...") + s, mp = message_ix.Scenario.from_url("ixmp://ixmp_dev/" + model_orig + "/" + scen_orig) - # clone scenario - print("Cloning scenario...") - s_new = s.clone(model_tgt, scen_tgt, comment, keep_solution=False) + # clone scenario + print("Cloning scenario...") + s_new = s.clone(model_tgt, scen_tgt, comment, keep_solution=False) - # solve the cloned scenario - print("Solving scenario...") - s_new.set_as_default() - s_new.solve( - "MESSAGE", - ) + # solve the cloned scenario + print("Solving scenario...") + s_new.set_as_default() + s_new.solve( + "MESSAGE", + ) - # close db - print("Closing database...") - mp.close_db() + # close db + print("Closing database...") + mp.close_db() Submitting Jobs @@ -742,30 +742,30 @@ work. So, run: .. code:: bash - nano job.do + nano ~/job/message/job.do In the editor, write/paste: .. code:: bash - #!/bin/bash - #SBATCH --time=3:00:00 - #SBATCH --mem=40G - #SBATCH --mail-type=BEGIN,END,FAIL - #SBATCH --mail-user=username@iiasa.ac.at - #SBATCH -o ~/out/solve_%J.out - #SBATCH -e ~/err/solve_%J.err + #!/bin/bash + #SBATCH --time=3:00:00 + #SBATCH --mem=40G + #SBATCH --mail-type=BEGIN,END,FAIL + #SBATCH --mail-user=username@iiasa.ac.at + #SBATCH -o ~/out/solve_%J.out + #SBATCH -e ~/err/solve_%J.err - module purge - source /opt/apps/lmod/8.7/init/bash - module load Python/3.11.5-GCCcore-13.2.0 - module load Java + module purge + source /opt/apps/lmod/8.7/init/bash + module load Python/3.11.5-GCCcore-13.2.0 + module load Java - echo "Activating environment..." - source ~/env/env-name/bin/activate + echo "Activating environment..." + source ~/env/env-name/bin/activate - echo "Running python script..." - python ~/jobs/message/solve.py + echo "Running python script..." + python ~/job/message/solve.py This script requests the following: @@ -789,13 +789,6 @@ It is important (I think) to load the Python and Java modules. I’m not sure why the ``source /opt/apps/lmod/8.7/init/bash`` line is there, but ICT included that in an email to me when I was asking for help. -NOTE ON MEMORY: If this is not specified, the default amount of memory -that gets assigned to the job is 2GB. I think more -CPUs per job could also be requested instead, which would also give more memory (2 GB times the -number of CPUs). But instead, just request more memory. This -is IMPORTANT because a job with 8GB of memory failed during -the reporting because it requires a lot of memory. - To submit the job, run the following (assuming you are in the folder where ``job.do`` is located): @@ -806,8 +799,124 @@ where ``job.do`` is located): The ``sbatch`` command is what submits the job, and whatever argument that comes after it is your job file. +Checking queue +-------------- + To check the status of the job(s) by the user: .. code:: bash squeue -u username + + +While the job is waiting/pending, your queue may look like this: + +.. code:: bash + + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 1234567 batch job1 username PD 0:00 1 (Resources) + +The ``ST`` column shows the status of the job. ``PD`` means pending. + +When the job is running, the queue may look like this: + +.. code:: bash + + JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) + 1234567 batch job1 username R 0:01 1 node1 + + +Usually my jobs run right away or within a few minutes of being submitted, +but sometimes they can sit in the queue for a while. This is usually because +there are a lot of jobs in the queue, and the cluster is busy. + +To check where all jobs submitted by all users are in the queue: + +.. code:: bash + + squeue + + +Checking job run information +---------------------------- + +To check information about a specific job, a helpful command is (replace +``1234567`` with the actual job ID): + +.. code:: bash + + scontrol show jobid 1234567 + +Your output will look something like this: + +.. code:: bash + + JobId=404543 JobName=job.do + UserId=mengm(32712) GroupId=mengm(60100) MCS_label=N/A + Priority=10000 Nice=0 Account=default QOS=normal + JobState=FAILED Reason=NonZeroExitCode Dependency=(null) + Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=1:0 + DerivedExitCode=0:0 + RunTime=00:00:11 TimeLimit=03:00:00 TimeMin=N/A + SubmitTime=2025-01-22T05:56:31 EligibleTime=2025-01-22T05:56:31 + AccrueTime=2025-01-22T05:56:31 + StartTime=2025-01-22T05:56:35 EndTime=2025-01-22T05:56:46 Deadline=N/A + PreemptEligibleTime=2025-01-22T05:56:35 PreemptTime=None + SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-01-22T05:56:35 Scheduler=Backfill + Partition=generic AllocNode:Sid=10.42.153.116:248 + ReqNodeList=(null) ExcNodeList=(null) + NodeList=compute2 + BatchHost=compute2 + NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* + ReqTRES=cpu=1,mem=40G,node=1,billing=1 + AllocTRES=cpu=1,mem=40G,node=1,billing=1 + Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* + JOB_GRES=(null) + Nodes=compute2 CPU_IDs=2 Mem=40960 GRES= + MinCPUsNode=1 MinMemoryNode=40G MinTmpDiskNode=0 + Features=(null) DelayBoot=00:00:00 + OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) + Command=/home/mengm/job/message/job.do + WorkDir=/home/mengm + StdErr=/home/mengm/~/err/solve_%J.err + StdIn=/dev/null + StdOut=/home/mengm/~/out/solve_%J.out + Power= + MailUser=username@iiasa.ac.at MailType=BEGIN,END,FAIL + +Here you see the job information, including submit time, the associated commands/files, and the output files. +Additionally, here you can see the resources requested and allocated for the job, such as number of nodes, CPUs, memory, etc. + +The ``JobState`` will show the status of the job. If it is ``FAILED``, the +``Reason`` will show why it failed. The ``ExitCode`` will show the exit +code of the job. If it is ``0:0``, then the job ran successfully. If it +is ``1:0``, then the job failed. + +When my job fails, I usually go ahead and check both the ``err`` and +``out`` files to see what happened. The ``err`` file will show any +errors or warnings that occurred during the job, and the ``out`` file +will show any print statements or output from the Python script. + +Another useful command to check recent jobs and their information is: + +.. code:: bash + + sacct -l + +However, this will show a lot of information, so it might be better to +run a more specific command like: + +.. code:: bash + + sacct --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTimeRAW,NodeList + +Note on memory +-------------- + +NOTE ON MEMORY: If this is not specified, the default amount of memory +that gets assigned to the job is 2GB. I think more +CPUs per job could also be requested instead, which would also give more memory (2 GB times the +number of CPUs). But instead, just request more memory. I especially recommend this because if +you're running legacy reporting, that requires a bit of memory, so your job might fail if +you don't request enough memory. + From 77aacda239552ce2862f7e05d1e8f3e6262f35b2 Mon Sep 17 00:00:00 2001 From: Measrainsey Meng Date: Wed, 22 Jan 2025 07:21:01 +0100 Subject: [PATCH 9/9] Add note on reducing run time --- doc/distrib/unicc.rst | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/doc/distrib/unicc.rst b/doc/distrib/unicc.rst index e8e0252ae1..69c618887c 100644 --- a/doc/distrib/unicc.rst +++ b/doc/distrib/unicc.rst @@ -910,6 +910,35 @@ run a more specific command like: sacct --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTimeRAW,NodeList + +Resources to request for reducing MESSAGEix run time +--------------------------------------------------- + +The following information is based on non-scientific "testing" (goofing around), so take it with a grain of salt. +I have found that requesting more CPUs per task can help reduce the run time of a MESSAGEix solve. + +For example, a MESSAGE job with ``#SBATCH --cpus-per-task=4`` took over 30 minutes to finish, +whereas the same job with ``#SBATCH --cpus-per-task=16`` took about 20 minutes to finish. +I also tried changing ``#SBATCH --ntasks=1`` to ``#SBATCH --ntasks=4``, but that didn't seem to make a difference in run time. + +So usually my ``SBATCH`` job request settings look like this: + +.. code:: bash + + #SBATCH --time=20:00:00 + #SBATCH --mem=100G + #SBATCH --nodes=1 + #SBATCH --ntasks=1 + #SBATCH --cpus-per-task=16 + +I usually request lots of run time (20 hours) and lots of memory (100 GB) because I don't want my job to for those reasons. + +I keep ``--nodes=1`` because I don't know enough about running on multiple nodes, and I don't really do any parallel computing, +so I don't think I need to request more than one node. + +In general though I'm sure there are other settings people can play around with to optimize their job run time, +including maybe on the CPLEX side for example, but I haven't really looked into that, and this is just what I've found so far. + Note on memory --------------