Skip to content

Commit f4e2453

Browse files
committed
Merge remote-tracking branch 'origin/dev' into tut3/intel
2 parents 7bb3c68 + d55e143 commit f4e2453

File tree

2 files changed

+52
-45
lines changed

2 files changed

+52
-45
lines changed

README.md

+28-30
Original file line numberDiff line numberDiff line change
@@ -3,44 +3,42 @@ CHPC 2024 Student Cluster Competition
33

44
Welcome the **Center for High Performance Computing (CHPC)'s Student Cluster Competition (SCC)** - Team Selection Round. This round requires each team to build a **prototype multi-node compute cluster** within the National Integrated Cyber Infrastructure Systems (NICIS) **virtual compute cloud** (described below).
55

6-
The goal of this tutorial is to introduce you to the competition platform and familiarise you with some Linux and systems administration concepts. This competition provides you with a fixed set of virtual resources, that you will use to initialize a set a set of virtual machines instances based on your choice _or flavor_ of **
6+
The goal of this document is to introduce you to the competition platform and familiarise you with some Linux and systems administration concepts. This competition provides you with a fixed set of virtual resources, that you will use to initialize a set a set of virtual machines instances based on your choice _or flavor_ of **
77

88
# Table of Contents
99

1010
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
1111

12-
1. [CHPC 2024 Student Cluster Competition](#chpc-2024-student-cluster-competition)
13-
1. [Table of Contents](#table-of-contents)
1412
1. [Structure of the Competition](#structure-of-the-competition)
15-
1. [Getting Help](#getting-help)
16-
1. [GitHub Discussions Page](#github-discussions-page)
17-
1. [GitHub Issues Page](#github-issues-page)
18-
1. [Using Chat GPT4](#using-chat-gpt4)
19-
1. [Timetable](#timetable)
20-
1. [Scoring](#scoring)
21-
1. [Instructions for Mentors](#instructions-for-mentors)
22-
1. [Hands-Off Rule *(You may not touch the keyboard)*](#hands-off-rule-you-may-not-touch-the-keyboard)
23-
1. [Cheat Sheet](#cheat-sheet)
13+
1. [Getting Help](#getting-help)
14+
1. [GitHub Discussions Page](#github-discussions-page)
15+
1. [GitHub Issues Page](#github-issues-page)
16+
1. [Using Chat GPT4](#using-chat-gpt4)
17+
1. [Timetable](#timetable)
18+
1. [Scoring](#scoring)
19+
1. [Instructions for Mentors](#instructions-for-mentors)
20+
1. [Hands-Off Rule *(You may not touch the keyboard)*](#hands-off-rule-you-may-not-touch-the-keyboard)
21+
1. [Cheat Sheet](#cheat-sheet)
2422
1. [Deliverables](#deliverables)
25-
1. [Project](#project)
26-
1. [Technical Knowledge Assessment](#technical-knowledge-assessment)
23+
1. [Project](#project)
24+
1. [Technical Knowledge Assessment](#technical-knowledge-assessment)
2725
1. [Links to Livestreams and Lecture Recordings](#links-to-livestreams-and-lecture-recordings)
28-
1. [Day 1 - Welcome, Introduction and Getting Started](#day-1---welcome-introduction-and-getting-started)
29-
1. [Day 2 - HPC Hardware, HPC Networking and Systems Administration](#day-2---hpc-hardware-hpc-networking-and-systems-administration)
30-
1. [Day 3 - Benchmarking, Compilation and Parallel Computing](#day-3---benchmarking-compilation-and-parallel-computing)
31-
1. [Day 4 - HPC Administration and Application Visualization](#day-4---hpc-administration-and-application-visualization)
32-
1. [Day 5 - Career Guidance](#day-5---career-guidance)
26+
1. [Day 1 - Welcome, Introduction and Getting Started](#day-1---welcome-introduction-and-getting-started)
27+
1. [Day 2 - HPC Hardware, HPC Networking and Systems Administration](#day-2---hpc-hardware-hpc-networking-and-systems-administration)
28+
1. [Day 3 - Benchmarking, Compilation and Parallel Computing](#day-3---benchmarking-compilation-and-parallel-computing)
29+
1. [Day 4 - HPC Administration and Application Visualization](#day-4---hpc-administration-and-application-visualization)
30+
1. [Day 5 - Career Guidance](#day-5---career-guidance)
3331
1. [Tutorial Glossary and Section Overview](#tutorial-glossary-and-section-overview)
34-
1. [Tutorial 1](#tutorial-1)
35-
1. [Tutorial 2](#tutorial-2)
36-
1. [Tutorial 3](#tutorial-3)
37-
1. [Tutorial 4](#tutorial-4)
32+
1. [Tutorial 1](#tutorial-1)
33+
1. [Tutorial 2](#tutorial-2)
34+
1. [Tutorial 3](#tutorial-3)
35+
1. [Tutorial 4](#tutorial-4)
3836
1. [Contributing to the Project](#contributing-to-the-project)
39-
1. [Steps to follow when editing existing content](#steps-to-follow-when-editing-existing-content)
40-
1. [Syntax and Style](#syntax-and-style)
37+
1. [Steps to follow when editing existing content](#steps-to-follow-when-editing-existing-content)
38+
1. [Syntax and Style](#syntax-and-style)
4139
1. [Collaborating with your Team and Storing you Progress on GitHub](#collaborating-with-your-team-and-storing-you-progress-on-github)
42-
1. [Forking the Tutorials into Your Own Team's Private GitHub Repository](#forking-the-tutorials-into-your-own-teams-private-github-repository)
43-
1. [Editing the Git Markdown Files to Track Your Team's Progress](#editing-the-git-markdown-files-to-track-your-teams-progress)
40+
1. [Forking the Tutorials into Your Own Team's Private GitHub Repository](#forking-the-tutorials-into-your-own-teams-private-github-repository)
41+
1. [Editing the Git Markdown Files to Track Your Team's Progress](#editing-the-git-markdown-files-to-track-your-teams-progress)
4442

4543
<!-- markdown-toc end -->
4644

@@ -204,15 +202,14 @@ Tutorial 1 deals with introducing concepts to users and getting them started wit
204202

205203
## Tutorial 2
206204

207-
Tutorial 2 deals with reverse proxy access for internal websites, central authentication and shared file systems.
205+
Tutorial 2 deals with understaning the roles of the head and compute nodes, adding a compute node to create your cluster, configuring linux services such as the firewall, time server.
208206
1. [Checklist](tutorial2/README.md#checklist)
209207
1. [Spinning Up a Compute Node in OpenStack](tutorial2/README.md#spinning-up-a-compute-node-in-openstack)
210208
1. [Compute Node Considerations](tutorial2/README.md#compute-node-considerations)
211209
1. [Accessing Your Compute Node](tutorial2/README.md#accessing-your-compute-node)
212210
1. [IP Addresses and Routing](tutorial2/README.md#ip-addresses-and-routing)
213211
1. [Command Line Proxy Jump Directive](tutorial2/README.md#command-line-proxy-jump-directive)
214212
1. [Setting a Temporary Password on your Compute Node](tutorial2/#setting-a-temporary-passworwd-on-your-compute-node)
215-
1. [Generating SSH Keys on Your Head Node](tutorial2/README.md#generating-ssh-keys-on-your-head-node)
216213
1. [Understanding the Roles of the Head Node and Compute Nodes](tutorial2/README.md#understanding-the-roles-of-the-head-node-and-compute-nodes)
217214
1. [Basic System Monitoring](tutorial2/README.md#basic-system-monitoring)
218215
1. [Terminal Multiplexers](tutorial2/README.md#terminal-multiplexers)
@@ -242,6 +239,7 @@ Tutorial 2 deals with reverse proxy access for internal websites, central authen
242239
1. [Mounting An NFS Mount](tutorial2/README.md#mounting-an-nfs-mount)
243240
1. [Making The NFS Mount Permanent](tutorial2/README.md#making-the-nfs-mount-permanent)
244241
1. [Passwordless SSH](tutorial2/README.md#passwordless-ssh)
242+
1. [Generating SSH Keys on Your Head Node](tutorial2/README.md#generating-ssh-keys-on-your-head-node)
245243
1. [Understanding `~/.ssh/authorized_keys`](tutorial2/README.md#understanding-ssh/authorized_keys)
246244
1. [User Permissions and Ownership](tutorial2/README.md#user-permissions-and-ownership)
247245
1. [User Account Management](tutorial2/README.md#user-account-management)
@@ -357,7 +355,7 @@ You are strongly encouraged to contribute and improve the project by [Opening an
357355
In order to effectively manage the various workflows and stages of development, testing and deployment, the project is comprised of three primary branches:
358356
* `main`: *Stable* and production-ready deployment branch of the project.
359357
* `stag`: *Staging* branch which mirrors production and is used for integration testing of new features.
360-
* `dev`: *Development* branch fore incorporating new features and bug fixes.
358+
* `dev`: *Development* branch for incorporating new features and bug fixes.
361359

362360
Editing the content directly, will require the use of Git. Using a terminal application or [Git for Windows PowerShell](https://git-scm.com/book/en/v2/Appendix-A:-Git-in-Other-Environments-Git-in-PowerShell) or [Git for MobaXTerm](https://www.geeksforgeeks.org/how-to-install-git-on-mobaxterm/).
363361

tutorial4/README.md

+24-15
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,6 @@
3434

3535
# Checklist
3636

37-
Tutorial 4 demonstrates environment module manipulation and the compilation and optimisation of HPC benchmark software. This introduces the reader to the concepts of environment management and workspace sanity, as well as compilation of software on Linux.
38-
39-
4037
This tutorial demonstrates _cluster monitoring_ and _workload scheduling_. These two components are critical to a typical HPC environment. Monitoring is a widely used component in system administration (including enterprise datacentres and corporate networks). Monitoring allows administrators to be aware of what is happening on any system that is being monitored and is useful to proactively identify where any potential issues may be. A workload scheduler ensures that users' jobs are handled properly to fairly balance all scheduled jobs with the resources available at any time.
4138

4239
In this tutorial you will:
@@ -194,11 +191,11 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
194191
195192
1. Make sure the clocks, i.e. chrony daemons, are synchronized across the cluster.
196193
197-
2. Generate a SLURM and MUNGE user on all of your nodes:
194+
2. Generate a **SLURM** and **MUNGE** user on all of your nodes:
198195
199-
- **If you have FreeIPA authentication working**
200-
- Create the users using the FreeIPA web interface. **Do NOT add them to the sysadmin group**.
201-
- **If you do NOT have FreeIPA authentication working**
196+
- **If you have Ansible User Module working**
197+
- Create the users as shown in tutorial 2 **Do NOT add them to the sysadmin group**.
198+
- **If you do NOT have your Ansible User Module working**
202199
- `useradd slurm`
203200
- Ensure that users and groups (UIDs and GIDs) are synchronized across the cluster. Read up on the appropriate [/etc/shadow](https://linuxize.com/post/etc-shadow-file/) and [/etc/password](https://www.cyberciti.biz/faq/understanding-etcpasswd-file-format/) files.
204201
@@ -213,10 +210,11 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
213210
[...@headnode ~]$ sudo dnf install epel-release
214211
```
215212
216-
Then we can install MUNGE, pulling the development source code from the `powertools` repository:
213+
Then we can install MUNGE, pulling the development source code from the `crb` "CodeReady Builder" repository:
217214
218215
```bash
219-
[...@headnode ~]$ sudo dnf --enablerepo=powertools install munge munge-libs munge-devel
216+
[...@headnode ~]$ sudo dnf config-manager --set-enabled crb
217+
[...@headnode ~]$ sudo dnf install munge munge-libs munge-devel
220218
```
221219
222220
2. Generate a MUNGE key for client authentication:
@@ -230,18 +228,29 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
230228
3. Using `scp`, copy the MUNGE key to your compute node to allow it to authenticate:
231229
232230
1. SSH into your compute node and create the directory `/etc/munge`. Then exit back to the head node.
231+
232+
2. Since, munge has not yet been installed on your compute node, first transfer the file to a temporary location
233+
```bash
234+
[...@headnode ~]$ sudo cp /etc/munge/munge.key /tmp/munge.key && sudo chown user:user /tmp/munge.key
235+
```
236+
**Replace user with the name of the user that you are running these commands as**
233237
234-
2. `scp /etc/munge/munge.key <compute_node_name_or_ip>:/etc/munge/munge.key`
238+
3. Move the file to your compute node
239+
```bash
240+
[...@headnode ~]$ scp /etc/munge/munge.key <compute_node_name_or_ip>:/etc/tmp/munge.key
241+
```
242+
243+
4. Move the file to the correct location
244+
```bash
245+
[...@headnode ~]$ ssh <computenode hostname or ip> 'sudo mv /tmp/munge.key /etc/munge/munge.key'
246+
```
235247
236248
4. **Start** and **enable** the `munge` service
237249
238250
5. Install dependency packages:
239251
240252
```bash
241-
[...@headnode ~]$ sudo dnf --enablerepo=powertools install python3 gcc openssl openssl-devel pam-devel numactl \
242-
numactl-devel hwloc lua readline-devel ncurses-devel man2html libibmad libibumad \
243-
rpm-build perl-ExtUtils-MakeMaker rrdtool-devel lua-devel hwloc-devel \
244-
perl-Switch libssh2-devel mariadb-devel
253+
[...@headnode ~]$ sudo dnf --enablerepo=crb install python3 gcc openssl openssl-devel pam-devel numactl numactl-devel hwloc lua readline-devel ncurses-devel man2html libibmad libibumad rpm-build perl-ExtUtils-MakeMaker rrdtool-devel lua-devel hwloc-devel perl-Switch libssh2-devel mariadb-devel -y
245254
[...@headnode ~]$ sudo dnf groupinstall "Development Tools"
246255
```
247256
@@ -261,7 +270,7 @@ The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource
261270

262271
This should successfully generate Slurm RPMs in the directory that you invoked the `rpmbuild` command from.
263272

264-
9. Copy these RPMs to your compute node to install later, using `scp`.
273+
9. Copy these RPMs to your compute node to install later, using `scp`.
265274

266275
10. Install Slurm server
267276

0 commit comments

Comments
 (0)