Skip to content

Commit d55e143

Browse files
committed
Updates to Tutorial 4
1 parent 11ce4d0 commit d55e143

File tree

1 file changed

+52
-41
lines changed

1 file changed

+52
-41
lines changed

tutorial4/README.md

+52-41
Original file line numberDiff line numberDiff line change
@@ -3,40 +3,39 @@
33
## Table of Contents
44
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
55

6-
1. [Checklist](#checklist)
7-
1. [(Delete) - Remote Web Service Access](#delete---remote-web-service-access)
8-
1. [Prometheus](#prometheus)
9-
1. [Edit YML Configuration File](#edit-yml-configuration-file)
10-
1. [Configuring Prometheus as a Service](#configuring-prometheus-as-a-service)
11-
1. [SSH Port Forwarding](#ssh-port-forwarding)
12-
1. [Dynamic SSH Forwarding (SOCKS Proxy)](#dynamic-ssh-forwarding-socks-proxy)
13-
1. [Configuring Your Browser](#configuring-your-browser)
14-
1. [X11 Forwarding](#x11-forwarding)
15-
1. [Grafana](#grafana)
16-
1. [Configuring Grafana Dashboards](#configuring-grafana-dashboards)
17-
1. [Node Exporter](#node-exporter)
18-
1. [Configuring Node Exporter as a Service](#configuring-node-exporter-as-a-service)
19-
1. [Slurm Scheduler and Workload Manager](#slurm-scheduler-and-workload-manager)
20-
1. [Prerequisites](#prerequisites)
21-
1. [Head Node Configuration (Server)](#head-node-configuration-server)
22-
1. [Compute Node Configuration (Clients)](#compute-node-configuration-clients)
23-
1. [Configure Grafana Dashboard for Slurm](#configure-grafana-dashboard-for-slurm)
24-
1. [Using Terraform to Automate the Deployment of your OpenStack Instances](#using-terraform-to-automate-the-deployment-of-your-openstack-instances)
25-
1. [Using Ansisble to Automate the Configuration of your VMs](#using-ansisble-to-automate-the-configuration-of-your-vms)
26-
1. [Introduction to Continuous Integration](#introduction-to-continuous-integration)
27-
1. [GitHub](#github)
28-
1. [TravisCI](#travisci)
29-
1. [CircleCI](#circleci)
30-
1. [GROMACS Protein Visualisation](#gromacs-protein-visualisation)
31-
1. [Running Qiskit from a Remote Jupyter Notebook Server](#running-qiskit-from-a-remote-jupyter-notebook-server)
6+
- [Student Cluster Compeititon - Tutorial 4](#student-cluster-compeititon---tutorial-4)
7+
- [Table of Contents](#table-of-contents)
8+
- [Checklist](#checklist)
9+
- [(Delete) - Remote Web Service Access](#delete---remote-web-service-access)
10+
- [Prometheus](#prometheus)
11+
- [Edit YML Configuration File](#edit-yml-configuration-file)
12+
- [Configuring Prometheus as a Service](#configuring-prometheus-as-a-service)
13+
- [SSH Port Forwarding](#ssh-port-forwarding)
14+
- [Dynamic SSH Forwarding (SOCKS Proxy)](#dynamic-ssh-forwarding-socks-proxy)
15+
- [Configuring Your Browser](#configuring-your-browser)
16+
- [X11 Forwarding](#x11-forwarding)
17+
- [Grafana](#grafana)
18+
- [Configuring Grafana Dashboards](#configuring-grafana-dashboards)
19+
- [Node Exporter](#node-exporter)
20+
- [Configuring Node Exporter as a Service](#configuring-node-exporter-as-a-service)
21+
- [Slurm Scheduler and Workload Manager](#slurm-scheduler-and-workload-manager)
22+
- [Prerequisites](#prerequisites)
23+
- [Head Node Configuration (Server)](#head-node-configuration-server)
24+
- [Compute Node Configuration (Clients)](#compute-node-configuration-clients)
25+
- [Configure Grafana Dashboard for Slurm](#configure-grafana-dashboard-for-slurm)
26+
- [Using Terraform to Automate the Deployment of your OpenStack Instances](#using-terraform-to-automate-the-deployment-of-your-openstack-instances)
27+
- [Using Ansisble to Automate the Configuration of your VMs](#using-ansisble-to-automate-the-configuration-of-your-vms)
28+
- [Introduction to Continuous Integration](#introduction-to-continuous-integration)
29+
- [GitHub](#github)
30+
- [TravisCI](#travisci)
31+
- [CircleCI](#circleci)
32+
- [GROMACS Protein Visualisation](#gromacs-protein-visualisation)
33+
- [Running Qiskit from a Remote Jupyter Notebook Server](#running-qiskit-from-a-remote-jupyter-notebook-server)
3234

3335
<!-- markdown-toc end -->
3436

3537
# Checklist
3638

37-
Tutorial 4 demonstrates environment module manipulation and the compilation and optimisation of HPC benchmark software. This introduces the reader to the concepts of environment management and workspace sanity, as well as compilation of software on Linux.
38-
39-
4039
This tutorial demonstrates _cluster monitoring_ and _workload scheduling_. These two components are critical to a typical HPC environment. Monitoring is a widely used component in system administration (including enterprise datacentres and corporate networks). Monitoring allows administrators to be aware of what is happening on any system that is being monitored and is useful to proactively identify where any potential issues may be. A workload scheduler ensures that users' jobs are handled properly to fairly balance all scheduled jobs with the resources available at any time.
4140

4241
In this tutorial you will:
@@ -194,11 +193,11 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
194193
195194
1. Make sure the clocks, i.e. chrony daemons, are synchronized across the cluster.
196195
197-
2. Generate a SLURM and MUNGE user on all of your nodes:
196+
2. Generate a **SLURM** and **MUNGE** user on all of your nodes:
198197
199-
- **If you have FreeIPA authentication working**
200-
- Create the users using the FreeIPA web interface. **Do NOT add them to the sysadmin group**.
201-
- **If you do NOT have FreeIPA authentication working**
198+
- **If you have Ansible User Module working**
199+
- Create the users as shown in tutorial 2 **Do NOT add them to the sysadmin group**.
200+
- **If you do NOT have your Ansible User Module working**
202201
- `useradd slurm`
203202
- Ensure that users and groups (UIDs and GIDs) are synchronized across the cluster. Read up on the appropriate [/etc/shadow](https://linuxize.com/post/etc-shadow-file/) and [/etc/password](https://www.cyberciti.biz/faq/understanding-etcpasswd-file-format/) files.
204203
@@ -213,10 +212,11 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
213212
[...@headnode ~]$ sudo dnf install epel-release
214213
```
215214
216-
Then we can install MUNGE, pulling the development source code from the `powertools` repository:
215+
Then we can install MUNGE, pulling the development source code from the `crb` "CodeReady Builder" repository:
217216
218217
```bash
219-
[...@headnode ~]$ sudo dnf --enablerepo=powertools install munge munge-libs munge-devel
218+
[...@headnode ~]$ sudo dnf config-manager --set-enabled crb
219+
[...@headnode ~]$ sudo dnf install munge munge-libs munge-devel
220220
```
221221
222222
2. Generate a MUNGE key for client authentication:
@@ -230,18 +230,29 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
230230
3. Using `scp`, copy the MUNGE key to your compute node to allow it to authenticate:
231231
232232
1. SSH into your compute node and create the directory `/etc/munge`. Then exit back to the head node.
233+
234+
2. Since, munge has not yet been installed on your compute node, first transfer the file to a temporary location
235+
```bash
236+
[...@headnode ~]$ sudo cp /etc/munge/munge.key /tmp/munge.key && sudo chown user:user /tmp/munge.key
237+
```
238+
**Replace user with the name of the user that you are running these commands as**
233239
234-
2. `scp /etc/munge/munge.key <compute_node_name_or_ip>:/etc/munge/munge.key`
240+
3. Move the file to your compute node
241+
```bash
242+
[...@headnode ~]$ scp /etc/munge/munge.key <compute_node_name_or_ip>:/etc/tmp/munge.key
243+
```
244+
245+
4. Move the file to the correct location
246+
```bash
247+
[...@headnode ~]$ ssh <computenode hostname or ip> 'sudo mv /tmp/munge.key /etc/munge/munge.key'
248+
```
235249
236250
4. **Start** and **enable** the `munge` service
237251
238252
5. Install dependency packages:
239253
240254
```bash
241-
[...@headnode ~]$ sudo dnf --enablerepo=powertools install python3 gcc openssl openssl-devel pam-devel numactl \
242-
numactl-devel hwloc lua readline-devel ncurses-devel man2html libibmad libibumad \
243-
rpm-build perl-ExtUtils-MakeMaker rrdtool-devel lua-devel hwloc-devel \
244-
perl-Switch libssh2-devel mariadb-devel
255+
[...@headnode ~]$ sudo dnf --enablerepo=crb install python3 gcc openssl openssl-devel pam-devel numactl numactl-devel hwloc lua readline-devel ncurses-devel man2html libibmad libibumad rpm-build perl-ExtUtils-MakeMaker rrdtool-devel lua-devel hwloc-devel perl-Switch libssh2-devel mariadb-devel -y
245256
[...@headnode ~]$ sudo dnf groupinstall "Development Tools"
246257
```
247258
@@ -261,7 +272,7 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
261272

262273
This should successfully generate Slurm RPMs in the directory that you invoked the `rpmbuild` command from.
263274

264-
9. Copy these RPMs to your compute node to install later, using `scp`.
275+
9. Copy these RPMs to your compute node to install later, using `scp`.
265276

266277
10. Install Slurm server
267278

0 commit comments

Comments
 (0)