diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/README.md b/2025-HPCIC/infrastructure/dry-run-w-nginx/README.md new file mode 100644 index 0000000..bc694d4 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/README.md @@ -0,0 +1,114 @@ +# Deploy perftools-hpcic-2025-dry-run to AWS Elastic Kubernetes Service (EKS) + +These config files and scripts can be used to deploy the perftools-hpcic-2025-dry-run tutorial to EKS. + +The sections below walk you through the steps to deploying your cluster. All commands in these +sections should be run from the same directory as this README. + +## Step 1: Create EKS cluster + +To create an EKS cluster with your configured settings, run the following: + +```bash +$ ./create_cluster.sh +``` + +Be aware that this step can take upwards of 15-30 minutes to complete. + +## Step 2: Configure Kubernetes within the EKS cluster + +After creating the cluster, we need to configure Kubernetes and its addons. In particular, +we need to setup the Kubernetes autoscaler, which will allow our tutorial to scale to as +many users as our cluster's resources can possibly handle. + +To configure Kubernetes and the autoscaler, run the following: + +```bash +$ ./configure_kubernetes.sh +``` + +## Step 3: Deploy JupyterHub to the EKS cluster + +With the cluster properly created and configured, we now can deploy JupyterHub to the cluster +to manage everything else about our tutorial. + +To deploy JupyterHub, run the following: + +```bash +$ ./deploy_jupyterhub.sh +``` + +## Step 4: Verify that everything is working + +After deploying JupyterHub, we need to make sure that all the necessary components +are working properly. + +To check this, run the following: + +```bash +$ ./check_jupyterhub_status.sh +``` + +If everything worked properly, you should see an output like this: + +``` +NAME READY STATUS RESTARTS AGE +continuous-image-puller-2gqrw 1/1 Running 0 30s +continuous-image-puller-gb7mj 1/1 Running 0 30s +hub-8446c9d589-vgjlw 1/1 Running 0 30s +proxy-7d98df9f7-s5gft 1/1 Running 0 30s +user-scheduler-668ff95ccf-fw6wv 1/1 Running 0 30s +user-scheduler-668ff95ccf-wq5xp 1/1 Running 0 30s +``` + +Be aware that the hub pod (i.e., hub-8446c9d589-vgjlw above) may take a minute or so to start. + +If something went wrong, you will have to edit the config YAML files to get things working. Before +trying to work things out yourself, check the FAQ to see if your issue has already been addressed. + +Depending on what file you edit, you may have to run different commands to update the EKS cluster and +deployment of JupyterHub. Follow the steps below to update: +1. If you only edited `helm-config.yaml`, try to just update the deployment of Jupyterhub by running `./update_jupyterhub_deployment.sh` +2. If step 1 failed, fully tear down the JupyterHub deployment with `./tear_down_jupyterhub.sh` and then re-deploy it with `./deploy_jupyterhub.sh` +3. If you edited `cluster-autoscaler.yaml` or `storage-class.yaml`, tear down the JupyterHub deployment with `./tear_down_jupyterhub.sh`. Then, reconfigure Kubernetes with `./configure_kubernetes.sh`, and re-deploy JupyterHub with `./deploy_jupyterhub.sh` +4. If you edited `eksctl-config.yaml`, fully tear down the cluster with `cleanup.sh`, and then restart from the top of this README + +## Step 5: Get the public cluster URL + +Now that everything's ready to go, we need to get the public URL to the cluster. + +To do this, run the following: + +```bash +$ ./get_jupyterhub_url.sh +``` + +Note that it can take several minutes after the URL is available for it to actually redirect +to JupyterHub. + +## Step 6: Distribute URL and password to attendees + +Now that we have our pulbic URL, we can give the attendees everything they need to join the tutorial. + +For attendees to access JupyterHub, they simply need to enter the public URL (from step 5) in their browser of choice. +This will take them to a login page. The login credentials are as follows: +* Username: anything the attendee wants (note: this should be unique for every user. Otherwise, users will share pods.) +* Password: the password specified towards the top of `helm-config.yaml` + +Once the attendees log in with these credentials, the Kubernetes autoscaler will spin up a pod for them (and grab new +resources, if needed). This pod will contain a JupyterLab instace with the tutorial materials and environment already +prepared for them. + +At this point, you can start presenting your interactive tutorial! + +## Step 7: Cleanup everything + +Once you are done with your tutorial, you should cleanup everything so that there are not continuing, unneccesary expenses +to your AWS account. To do this, simply run the following: + +```bash +$ ./cleanup.sh +``` + +After cleaning everything up, you can verify that everything has been cleaned up by going to the AWS web consle +and ensuring nothing from your tutorial still exists in CloudFormation and EKS. \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/check_hub_log.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/check_hub_log.sh new file mode 100755 index 0000000..1c13e91 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/check_hub_log.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v kubectl >/dev/null 2>&1; then + echo "ERROR: 'kubectl' is required to configure a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://kubernetes.io/docs/tasks/tools/#kubectl" + exit 1 +fi + +hub_pod_id=$(kubectl get pods -n default --no-headers=true | awk '/hub/{print $1}') +kubectl logs $hub_pod_id \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/check_init_container_log.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/check_init_container_log.sh new file mode 100755 index 0000000..f4fd398 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/check_init_container_log.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v kubectl >/dev/null 2>&1; then + echo "ERROR: 'kubectl' is required to configure a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://kubernetes.io/docs/tasks/tools/#kubectl" + exit 1 +fi + +if [ $# -ne 1 ]; then + echo "Usage: ./check_init_container_log.sh " + exit 1 +fi + +kubectl logs $1 -c init-tutorial-service \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/check_jupyterhub_status.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/check_jupyterhub_status.sh new file mode 100755 index 0000000..10b4261 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/check_jupyterhub_status.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v kubectl >/dev/null 2>&1; then + echo "ERROR: 'kubectl' is required to configure a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://kubernetes.io/docs/tasks/tools/#kubectl" + exit 1 +fi + +kubectl --namespace=default get pods + +echo "If there are issues with any pods, you can get more details with:" +echo " $ kubectl --namespace=default describe pod " \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/cleanup.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/cleanup.sh new file mode 100755 index 0000000..f2287cf --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/cleanup.sh @@ -0,0 +1,42 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v kubectl >/dev/null 2>&1; then + echo "ERROR: 'kubectl' is required to configure a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://kubernetes.io/docs/tasks/tools/#kubectl" + exit 1 +fi + +if ! command -v eksctl >/dev/null 2>&1; then + echo "ERROR: 'eksctl' is required to create a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://eksctl.io/installation/" + exit 1 +fi + +if ! command -v helm >/dev/null 2>&1; then + echo "ERROR: 'helm' is required to configure and launch JupyterHub on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://helm.sh/docs/intro/install/" + exit 1 +fi + +# Temporarily allow errors in the script so that the script won't fail +# if the JupyterHub deployment failed or was previously torn down +set +e +echo "Tearing down JupyterHub and uninstalling everything related to Helm:" +helm uninstall perftools-hpcic-2025-dry-run-jupyter +set -e + +echo "" +echo "Deleting all pods from the EKS cluster:" +kubectl delete pod --all-namespaces --all --force + +echo "" +echo "Deleting the EKS cluster:" +eksctl delete cluster --config-file ./eksctl-config.yaml --wait + +echo "" +echo "Everything is now cleaned up!" \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/cluster-autoscaler.yaml b/2025-HPCIC/infrastructure/dry-run-w-nginx/cluster-autoscaler.yaml new file mode 100644 index 0000000..0ee15ef --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/cluster-autoscaler.yaml @@ -0,0 +1,272 @@ +# The roles defined in this config file set permissions on several Kubernetes resources. +# +# Resources referred to: +# * events: resource representing information/responses generated from actions or changes taken against the cluster +# * endpoints: resource representing REST API endpoints within the cluster +# * pods/eviction: resource that terminates and removes pods when created +# * pods/status: resource used to query or edit the status of pods +# * nodes: resource representing the physical or virtual nodes of the cluster +# * namespaces: resource representing a group of isolated resources within the cluster +# * pods: resource representing a unit of computation that is deployed to a node +# * services: resource representing a networked application running in a pod and exposed over the network (either internal to the cluster or external to the broader internet) +# * replicationcontrollers: legacy resource for managing horizontal scaling (i.e., scale-out). Used for broader support across clouds +# * persistantvolumeclaims: resource representing a request for storage by a user +# * persistantvolumes: resource representing actual storage +# * replicasets: resource that creates replica pods that are used to ensure some minimum number of identical pods in the cluster +# * daemonsets: resource that ensures copies of pods are deployed to new nodes and removed from removed nodes +# * poddisruptionbudgets: resource that represents the cluster policy regarding the minimum number of pods that must remain available +# during voluntary disruptions (i.e., pod/node eviction not caused by something like hardware failure) +# * statefulsets: resource that maintains pod state +# * storageclasses: resource that describes different types of storage. Often used for things like QoS levels +# * csinodes: resource that describes a node's ability to interact with one or more storage providers. Mainly used by Kubernetes's scheduler +# * csidrivers: resource that provide information on the drivers for a single storage provider installed on a node +# * csistoragecapacities: resource that describes the available storage from different providers +# * jobs: resource that represents one-off tasks spread across one or more pods that must run to completion. Useful for certain types of setup and elasticity work +# * leases: resource that allows different pods, nodes, or kublets (kubernetes daemon on a node) to lock shared resources. Think of it like a mutex +# * configmaps: resource representing non-confidential key-value pair info. Often used to decouple environment-specific configuration from container images +--- +# Create a Service Account that will act as the internal user during the creation +# of the autoscaling infrastructure and have all the appropriate roles and permissions assigned +# to do its work +apiVersion: v1 +kind: ServiceAccount +metadata: + labels: + k8s-addons: cluster-autoscaler.addons.k8s.io + k8s-app: cluster-autoscaler + name: cluster-autoscaler + namespace: kube-system +--- +# Create a ClusterRole to set permissions for associated +# users across the entire cluster +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: cluster-autoscaler + labels: + k8s-addons: cluster-autoscaler.addons.k8s.io + k8s-app: cluster-autoscaler +rules: + # Allow associated users to create or partially update events and endpoints + - apiGroups: [""] + resources: ["events", "endpoints"] + verbs: ["create", "patch"] + # Allow associated users to evict pods + - apiGroups: [""] + resources: ["pods/eviction"] + verbs: ["create"] + # Allow associated users to update pod statuses + - apiGroups: [""] + resources: ["pods/status"] + verbs: ["update"] + # Allow associated users to get and update the state of the autoscaler + - apiGroups: [""] + resources: ["endpoints"] + resourceNames: ["cluster-autoscaler"] + verbs: ["get", "update"] + # Allow associated users to be notified of changes to, list, get the state of, + # and fully update information related to nodes + - apiGroups: [""] + resources: ["nodes"] + verbs: ["watch", "list", "get", "update"] + # Allow associated users to be notified of changes to, list, and get the state of + # namespaces, pods, services, replication controllers, persistent volume claims, and + # persistent volumes + - apiGroups: [""] + resources: + - "namespaces" + - "pods" + - "services" + - "replicationcontrollers" + - "persistentvolumeclaims" + - "persistentvolumes" + verbs: ["watch", "list", "get"] + # Allow associated users to be notified of changes to, list, and get the state of + # replica sets, and daemon sets + - apiGroups: ["extensions"] + resources: ["replicasets", "daemonsets"] + verbs: ["watch", "list", "get"] + # Allow associated users to be notified of changes to and list pod disruption budgets + - apiGroups: ["policy"] + resources: ["poddisruptionbudgets"] + verbs: ["watch", "list"] + # Allow associated users to be notified of changes to, list, and get the state of + # stateful sets, replica sets, and daemon sets + - apiGroups: ["apps"] + resources: ["statefulsets", "replicasets", "daemonsets"] + verbs: ["watch", "list", "get"] + # Allow associated users to be notified of chagnes to, list, and get the state of + # all resources related to available storage + - apiGroups: ["storage.k8s.io"] + resources: + ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"] + verbs: ["watch", "list", "get"] + # Allow associated users to get the state of, list, be notified of chagnes to, and partially update + # jobs launched in the cluster + - apiGroups: ["batch", "extensions"] + resources: ["jobs"] + verbs: ["get", "list", "watch", "patch"] + # Allow associated users to create leases + - apiGroups: ["coordination.k8s.io"] + resources: ["leases"] + verbs: ["create"] + # Allow associated users to get the state of and fully update leases in the autoscaler + - apiGroups: ["coordination.k8s.io"] + resourceNames: ["cluster-autoscaler"] + resources: ["leases"] + verbs: ["get", "update"] +--- +# Create a Role to set permissions within the 'kube-system' namespace +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: cluster-autoscaler + # The permissions in this Role apply to the 'kube-system' namespace + namespace: kube-system + labels: + k8s-addons: cluster-autoscaler.addons.k8s.io + k8s-app: cluster-autoscaler +rules: + # Allow associated users to create, list, and be notified of changes to config maps + - apiGroups: [""] + resources: ["configmaps"] + verbs: ["create", "list", "watch"] + # Allow associated users to delete, get the state of, fully update, and be notified of + # changes to config maps in the autoscaler's status and priority-expander subresources + - apiGroups: [""] + resources: ["configmaps"] + resourceNames: + - "cluster-autoscaler-status" + - "cluster-autoscaler-priority-expander" + verbs: ["delete", "get", "update", "watch"] +--- +# Grant permissions defined by the ClusterRole +# to users defined by the ServiceAccount +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: cluster-autoscaler + labels: + k8s-addons: cluster-autoscaler.addons.k8s.io + k8s-app: cluster-autoscaler +# Use the ClusterRole named "cluster-autoscaler" in the binding +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: cluster-autoscaler +# Use the ServiceAccount named "cluster-autoscaler" +# in the "kube-system" workspace in the binding +subjects: + - kind: ServiceAccount + name: cluster-autoscaler + namespace: kube-system +--- +# Grant permissions defined by the Role +# to users defined by the ServiceAccount +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: cluster-autoscaler + namespace: kube-system + labels: + k8s-addons: cluster-autoscaler.addons.k8s.io + k8s-app: cluster-autoscaler +# Use the Role named "cluster-autoscaler" in the binding +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: cluster-autoscaler +# Use the ServiceAccount named "cluster-autoscaler" +# in the "kube-system" workspace in the binding +subjects: + - kind: ServiceAccount + name: cluster-autoscaler + namespace: kube-system +--- +# Define deployment rules for pods and ReplicaSets +apiVersion: apps/v1 +kind: Deployment +metadata: + name: cluster-autoscaler + namespace: kube-system + labels: + app: cluster-autoscaler +spec: + replicas: 1 # Number of pods to run + # Apply to pods where the app has a label called 'app' + # with value 'cluster-autoscaler' + selector: + matchLabels: + app: cluster-autoscaler + # Definition of created pods + template: + metadata: + labels: + app: cluster-autoscaler + # Allow Prometheus to collect monitoring data over port 8085 + annotations: + prometheus.io/scrape: "true" + prometheus.io/port: "8085" + spec: + priorityClassName: system-cluster-critical + securityContext: + # The Kubelet must be run as a non-root user + runAsNonRoot: true + runAsUser: 65534 + fsGroup: 65534 + # Use the default seccomp profile as specified by the + # container runtime + seccompProfile: + type: RuntimeDefault + serviceAccountName: cluster-autoscaler + # The container(s) to run within the pod. + # Since we're running an autoscaler, we'll run the autoscaler + # as the pod's only container, and then we'll deploy other + # containers within the autoscaler to actually do work + containers: + # The main container for the pod will be the + # Kubernetes autoscaling container + - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.2 + name: cluster-autoscaler + resources: + # Maximum amount of compute resources allowed + limits: + cpu: 100m + memory: 600Mi + # Minimum amount of compute resources required + # Defaults to 'limits' if not specified + requests: + cpu: 100m + memory: 600Mi + command: + - ./cluster-autoscaler + - --v=4 + - --stderrthreshold=info + - --cloud-provider=aws + - --skip-nodes-with-local-storage=false + - --expander=least-waste + - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/perftools-hpcic-2025-dry-run + volumeMounts: + # Mount the CA SSL/TLS certificates into the container + - name: ssl-certs + mountPath: /etc/ssl/certs/ca-certificates.crt + readOnly: true + # Always pull the digest of the image from the + # container registry. If the locally cached digest is + # the same as the pulled digest, use the cached container image. + # Otherwise, pull the container from the registry + imagePullPolicy: "Always" + securityContext: + # Don't let the pod have more privileges than the + # parent process + allowPrivilegeEscalation: false + capabilities: + # Remove all capabilities + drop: + - ALL + # Root filesystem (i.e., '/') is read-only + readOnlyRootFilesystem: true + volumes: + - name: ssl-certs + hostPath: + path: "/etc/ssl/certs/ca-bundle.crt" \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/config.toml b/2025-HPCIC/infrastructure/dry-run-w-nginx/config.toml new file mode 100644 index 0000000..134f542 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/config.toml @@ -0,0 +1,55 @@ +tutorial_name = "perftools-hpcic-2025-dry-run" + +[aws.eksctl] +cluster_name = "perftools-hpcic-2025-dry-run" +cluster_deployment_region = "us-west-1" +cluster_availability_zones = [ + "us-west-1a", + "us-west-1c", +] + +[[aws.eksctl.cluster_node_groups]] +zone = "us-west-1a" +instance_type = "c7i.12xlarge" +volume_size = 30 +desired_size = 2 +min_size = 2 +max_size = 8 + +[[aws.eksctl.cluster_node_groups]] +zone = "us-west-1c" +instance_type = "c7i.12xlarge" +volume_size = 30 +desired_size = 2 +min_size = 2 +max_size = 8 + +[aws."Kubernetes autoscaler"] +cpu_max = "100m" +memory_max = "600Mi" +cpu_min = "100m" +memory_min = "600Mi" + +[aws.Helm] +max_concurrent_users = 14 +hub_password = "hpctutorial25" +hub_db_capacity = "32Gi" +ebs_storage_type = "gp3" +use_nginx = true +hub_container_image = "jupyterhub/k8s-hub" +hub_container_tag = "4.2.0" +spawner_container_image = "ghcr.io/llnl/reproducible-benchmarking-spawn" +spawner_container_tag = "hpcic-2025" +spawner_image_entrypoint = "/entrypoint.sh 32" +cpu_min = "32" +cpu_max = "32" +mem_min = "64G" +mem_max = "64G" +provide_extra_shmem = true +init_container_image = "ghcr.io/llnl/reproducible-benchmarking-init" +init_container_tag = "hpcic-2025" +init_image_entrypoint = "/entrypoint.sh" + +[aws."utility scripts"] +jupyterhub_helm_version = "4.2.0" +ebs_csidriver_version = "v1.45.0" diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/configure_kubernetes.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/configure_kubernetes.sh new file mode 100755 index 0000000..f21c899 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/configure_kubernetes.sh @@ -0,0 +1,30 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v kubectl >/dev/null 2>&1; then + echo "ERROR: 'kubectl' is required to configure a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://kubernetes.io/docs/tasks/tools/#kubectl" + exit 1 +fi + +echo "Configuring the Cluster Autoscaler:" +kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=v1.45.0" +kubectl apply -f ./cluster-autoscaler.yaml +echo "" +echo "Configuring the Storage Class:" +kubectl apply -f ./storage-class.yaml + +echo "" +echo "Patching the cluster to make the configured storage class the default:" +kubectl patch storageclass gp3 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' +kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' + +echo "Adding nginx to EKS cluster using kubectl:" +kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.13.0/deploy/static/provider/cloud/deploy.yaml + +echo "" +echo "Done configuring Kubernetes!" +echo "" +echo "Next, you should run deploy_jupyterhub.sh to actually deploy JupyterHub and the tutorial." \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/create_cluster.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/create_cluster.sh new file mode 100755 index 0000000..f631168 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/create_cluster.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v eksctl >/dev/null 2>&1; then + echo "ERROR: 'eksctl' is required to create a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://eksctl.io/installation/" + exit 1 +fi + +echo "Creating EKS cluster with eksctl:" +eksctl create cluster --config-file ./eksctl-config.yaml + +echo "Done creating the EKS cluster!" +echo "" +echo "Next, you should run configure_kubernetes.sh to configure Kubernetes on the cluster." \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/deploy_jupyterhub.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/deploy_jupyterhub.sh new file mode 100755 index 0000000..7aec41e --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/deploy_jupyterhub.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v helm >/dev/null 2>&1; then + echo "ERROR: 'helm' is required to configure and launch JupyterHub on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://helm.sh/docs/intro/install/" + exit 1 +fi + +echo "Adding JupyterHub to EKS cluster using Helm:" +helm repo add jupyterhub https://hub.jupyter.org/helm-chart/ +helm repo update +echo "" +echo "Installing the Helm chart and deploying JupyterHub to EKS:" +helm install perftools-hpcic-2025-dry-run-jupyter jupyterhub/jupyterhub --version 4.2.0 --values ./helm-config.yaml + +echo "" +echo "Done deploying JupyterHub!" +echo "" +echo "Next, you should ensure all the pods spawned correctly with check_jupyterhub_status.sh," +echo "and you should get the cluster URL with get_jupyterhub_url.sh." +echo "" +echo "If something went wrong, you can edit the helm-config.yaml file to try to fix the issue." +echo "After editing helm-config.yaml, you can normally reconfigure and relaunch JupyterHub using" +echo "the update_jupyterhub_deployment.sh script. If that doesn't work or if you need to edit" +echo "storage-class.yaml or cluster-autoscaler.yaml, you should first tear down JupyterHub with" +echo "tear_down_jupyterhub.sh, and then you should bring Jupyter back up by rerunning deploy_jupyterhub.sh." +echo "" +echo "If everything went smoothly, the cluster URL is what you should share with attendees." +echo "" +echo "Attendees can get a Jupyter environment to work in by going to that URL and logging in" +echo "with a username of their choice and the password specified in helm-config.yaml." +echo "" +echo "Note: users should have unique usernames. If two users have the same username, they will" +echo " share the same pod." +echo "" +echo "After you are done with your tutorial, you should finally run cleanup.sh to bring down" +echo "the EKS cluster and all associated resources." \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/eksctl-config.yaml b/2025-HPCIC/infrastructure/dry-run-w-nginx/eksctl-config.yaml new file mode 100644 index 0000000..8ceeb30 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/eksctl-config.yaml @@ -0,0 +1,107 @@ +apiVersion: eksctl.io/v1alpha5 +kind: ClusterConfig +# Define the name of the cluster and the deployment region +metadata: + name: perftools-hpcic-2025-dry-run + region: us-west-1 + +# Create the IAM policies needed to enable the autoscaler and storage +iam: + withOIDC: true + serviceAccounts: + - metadata: + name: cluster-autoscaler + namespace: kube-system + labels: + aws-usage: "cluster-ops" + app.kubernetes.io/name: cluster-autoscaler + + # https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md + attachPolicy: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: + - "autoscaling:DescribeAutoScalingGroups" + - "autoscaling:DescribeAutoScalingInstances" + - "autoscaling:DescribeLaunchConfigurations" + - "autoscaling:DescribeTags" + - "autoscaling:SetDesiredCapacity" + - "autoscaling:TerminateInstanceInAutoScalingGroup" + - "ec2:DescribeLaunchTemplateVersions" + Resource: "*" + + - metadata: + name: ebs-csi-controller-sa + namespace: kube-system + labels: + aws-usage: "cluster-ops" + app.kubernetes.io/name: aws-ebs-csi-driver + attachPolicy: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: + - "ec2:AttachVolume" + - "ec2:CreateSnapshot" + - "ec2:CreateTags" + - "ec2:CreateVolume" + - "ec2:DeleteSnapshot" + - "ec2:DeleteTags" + - "ec2:DeleteVolume" + - "ec2:DescribeInstances" + - "ec2:DescribeSnapshots" + - "ec2:DescribeTags" + - "ec2:DescribeVolumes" + - "ec2:DetachVolume" + Resource: "*" + +# Specify the availability zone from which nodes will be obtained +availabilityZones: +- "us-west-1a" +- "us-west-1c" + +# Define rules for nodegroups for each availability zone +managedNodeGroups: + - name: node-group-us-west-1a + # Set policies/permissions to autoscale + iam: + withAddonPolicies: + autoScaler: true + # Instance type to allocate + instanceType: c7i.12xlarge + # Size of storage volume for the availability zone, in gigabytes + volumeSize: 30 + # Number of nodes to start with in this availability zone + desiredCapacity: 2 + # Minimum number of nodes that will always be allocated in this availability zone + minSize: 2 + # Maximum number of nodes that will every be allocated in this availability zone + maxSize: 8 + privateNetworking: true + availabilityZones: + - us-west-1a + tags: + k8s.io/cluster-autoscaler/enabled: "true" + k8s.io/cluster-autoscaler/jupyterhub: "owned" + - name: node-group-us-west-1c + # Set policies/permissions to autoscale + iam: + withAddonPolicies: + autoScaler: true + # Instance type to allocate + instanceType: c7i.12xlarge + # Size of storage volume for the availability zone, in gigabytes + volumeSize: 30 + # Number of nodes to start with in this availability zone + desiredCapacity: 2 + # Minimum number of nodes that will always be allocated in this availability zone + minSize: 2 + # Maximum number of nodes that will every be allocated in this availability zone + maxSize: 8 + privateNetworking: true + availabilityZones: + - us-west-1c + tags: + k8s.io/cluster-autoscaler/enabled: "true" + k8s.io/cluster-autoscaler/jupyterhub: "owned" diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/get_jupyterhub_url.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/get_jupyterhub_url.sh new file mode 100755 index 0000000..e7c105c --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/get_jupyterhub_url.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v kubectl >/dev/null 2>&1; then + echo "ERROR: 'kubectl' is required to configure a Kubernetes cluster on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://kubernetes.io/docs/tasks/tools/#kubectl" + exit 1 +fi + +kubectl --namespace ingress-nginx get -o json svc ingress-nginx-controller | jq '.status.loadBalancer.ingress[0].hostname' diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/helm-config.yaml b/2025-HPCIC/infrastructure/dry-run-w-nginx/helm-config.yaml new file mode 100644 index 0000000..d445eb3 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/helm-config.yaml @@ -0,0 +1,120 @@ +# Uncomment if you need to debug your deployment of Jupyter. +# For more information on debugging, see: +# https://z2jh.jupyter.org/en/stable/administrator/debug.html +# debug: +# enabled: true + +hub: + # Maximum number of users with spawned JupyterLab environments (i.e., pods) at a time + concurrentSpawnLimit: 14 + config: + # Define a password for login + DummyAuthenticator: + password: hpctutorial25 + JupyterHub: + admin_access: true + authenticator_class: dummy + + # Define storage quantity for JupyterHub's persistent database + # We could explicitly set storage class name here, + # but we won't because we've marked the storage class defined + # in storage-class.yaml as default + db: + pvc: + storage: 32Gi + storageClassName: gp3 + + # Specify the hub image for the tutorial. + # The hub image should be based off of the jupyterhub/k8s-hub image. + # Its job is twofold: + # 1) If desired, replace the login page (at /usr/local/share/jupyterhub/templates/login.html) with a custom HTML login page + # 2) Set the user + image: + name: jupyterhub/k8s-hub + tag: "4.2.0" + pullPolicy: Always + + # Define resource usage for JupyterHub + # For large tutorials, it is recommended to set these higher + # We are just using defualt resource usage + + # Define custom hostname for JupyterHub + # We are not using a custom hostname + +ingress: + enabled: true + ingressClassName: "nginx" + +# Based on optimization recommendations from: +# https://z2jh.jupyter.org/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders +# scheduling: +# podPriority: +# enabled: true +# userPlaceholder: +# replicas: 3 + +# Define the spawner and init containers for each attendee's pod +singleuser: + # Specify the spawner image for the tutorial. + # The spawner image should do the following: + # 1) Install any necessary software + # 2) Define the user for the tutorial (we usually default to jovyan) + # 3) If custom Python packages are needed, it's often recommended to install a custom Jupyter kernel with `IPython kernel install` + # 4) If you want a custom Jupyter launcher UI, install the appropriate packages and update JUPYTER_APP_LAUNCHER_PATH + # 5) Copy any necessary local scripts or files and ensure proper permissions + image: + name: ghcr.io/llnl/reproducible-benchmarking-spawn + tag: "hpcic-2025" + pullPolicy: Always + # Specify the minimum (i.e., guarantee) and maximum (i.e., limit) amount of resources per user + cpu: + limit: 32 + guarantee: 32 + memory: + limit: "64G" + guarantee: "64G" + # If needed, specify a custom entrypoint into the spawner image. + # For more information, look at the documentation for Docker ENTRYPOINT and CMD directives: + # https://www.docker.com/blog/docker-best-practices-choosing-between-run-cmd-and-entrypoint/ + cmd: ["/entrypoint.sh", "32"] + # Specify the init image for the tutorial. + # This image is optional, but it can be used to do last second configuration or installation of files + # before the user gains control of the pod. + # + # A good usecase for the init image is to set permissions and ensure the tutorial user will be able to + # access the files for your tutorial. An example Dockerfile for the init image may look like: + # + # Dockerfile: + # FROM alpine/git + # ENV NB_USER=jovyan \ + # NB_UID=1000 \ + # HOME=/home/jovyan + # + # RUN adduser \ + # -D \ + # -g "Default user" \ + # -u ${NB_UID} \ + # -h ${HOME} \ + # ${NB_USER} + # + # COPY ./init-entrypoint.sh /entrypoint.sh + # + # The 'command' field for the init container specifies the entrypoint for the container. For the Dockerfile + # above, the entrypoint should be "/entrypoint.sh". This script could look something like this: + # + # entrypoint.sh (would be ./init-entrypoint.sh on your local computer) + # chown -R 1000 /home/jovyan + initContainers: + - name: init-tutorial-service + image: ghcr.io/llnl/reproducible-benchmarking-init:hpcic-2025 + command: ["/entrypoint.sh"] + imagePullPolicy: Always + storage: + type: none + extraVolumes: + - name: shm-volume + emptyDir: + medium: Memory + extraVolumeMounts: + - name: shm-volume + mountPath: /dev/shm diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/storage-class.yaml b/2025-HPCIC/infrastructure/dry-run-w-nginx/storage-class.yaml new file mode 100644 index 0000000..b83a030 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/storage-class.yaml @@ -0,0 +1,7 @@ +apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: gp3 +provisioner: kubernetes.io/aws-ebs +volumeBindingMode: WaitForFirstConsumer +reclaimPolicy: Delete \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/tear_down_jupyterhub.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/tear_down_jupyterhub.sh new file mode 100755 index 0000000..6cfa310 --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/tear_down_jupyterhub.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v helm >/dev/null 2>&1; then + echo "ERROR: 'helm' is required to configure and launch JupyterHub on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://helm.sh/docs/intro/install/" + exit 1 +fi + +helm uninstall perftools-hpcic-2025-dry-run-jupyter + +echo "Helm's JupyterHub deployment is torn down." +echo "If any attendee pods are remaining, you can delete them with 'kubectl delete pod '" +echo "" +echo "To recreate the JupyterHub deployment, just run deploy_jupyterhub.sh again." \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run-w-nginx/update_jupyterhub_deployment.sh b/2025-HPCIC/infrastructure/dry-run-w-nginx/update_jupyterhub_deployment.sh new file mode 100755 index 0000000..e273c0c --- /dev/null +++ b/2025-HPCIC/infrastructure/dry-run-w-nginx/update_jupyterhub_deployment.sh @@ -0,0 +1,14 @@ +#!/usr/bin/env bash + +set -e + +if ! command -v helm >/dev/null 2>&1; then + echo "ERROR: 'helm' is required to configure and launch JupyterHub on AWS with this script!" + echo " Installation instructions can be found here:" + echo " https://helm.sh/docs/intro/install/" + exit 1 +fi + +helm upgrade perftools-hpcic-2025-dry-run-jupyter jupyterhub/jupyterhub --values ./helm-config.yaml + +echo "The JupyterHub deployment is updated!" \ No newline at end of file diff --git a/2025-HPCIC/infrastructure/dry-run/eksctl-config.yaml b/2025-HPCIC/infrastructure/dry-run/eksctl-config.yaml index 18404d7..8ceeb30 100644 --- a/2025-HPCIC/infrastructure/dry-run/eksctl-config.yaml +++ b/2025-HPCIC/infrastructure/dry-run/eksctl-config.yaml @@ -61,10 +61,8 @@ availabilityZones: - "us-west-1a" - "us-west-1c" - # Define rules for nodegroups for each availability zone managedNodeGroups: - - name: node-group-us-west-1a # Set policies/permissions to autoscale iam: @@ -86,7 +84,6 @@ managedNodeGroups: tags: k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/jupyterhub: "owned" - - name: node-group-us-west-1c # Set policies/permissions to autoscale iam: diff --git a/2025-HPCIC/infrastructure/dry-run/get_jupyterhub_url.sh b/2025-HPCIC/infrastructure/dry-run/get_jupyterhub_url.sh index ddfd250..d58ce0b 100755 --- a/2025-HPCIC/infrastructure/dry-run/get_jupyterhub_url.sh +++ b/2025-HPCIC/infrastructure/dry-run/get_jupyterhub_url.sh @@ -9,4 +9,4 @@ if ! command -v kubectl >/dev/null 2>&1; then exit 1 fi -kubectl get -o json service proxy-public | jq '.status.loadBalancer.ingress[0].hostname' \ No newline at end of file +kubectl get -o json service proxy-public | jq '.status.loadBalancer.ingress[0].hostname' diff --git a/2025-HPCIC/infrastructure/dry-run/helm-config.yaml b/2025-HPCIC/infrastructure/dry-run/helm-config.yaml index dc217c0..22505c4 100644 --- a/2025-HPCIC/infrastructure/dry-run/helm-config.yaml +++ b/2025-HPCIC/infrastructure/dry-run/helm-config.yaml @@ -36,12 +36,9 @@ hub: # Define resource usage for JupyterHub # For large tutorials, it is recommended to set these higher - # We are just using defualt resource usage - # Define custom hostname for JupyterHub - # We are not using a custom hostname @@ -118,4 +115,3 @@ singleuser: extraVolumeMounts: - name: shm-volume mountPath: /dev/shm - \ No newline at end of file diff --git a/2025-eScience/infrastructure/dry-run/README.md b/2025-eScience/infrastructure/dry-run/README.md index 97a71ff..7764877 100644 --- a/2025-eScience/infrastructure/dry-run/README.md +++ b/2025-eScience/infrastructure/dry-run/README.md @@ -111,4 +111,4 @@ $ ./cleanup.sh ``` After cleaning everything up, you can verify that everything has been cleaned up by going to the AWS web consle -and ensuring nothing from your tutorial still exists in CloudFormation and EKS. +and ensuring nothing from your tutorial still exists in CloudFormation and EKS. \ No newline at end of file diff --git a/2025-eScience/infrastructure/dry-run/cleanup.sh b/2025-eScience/infrastructure/dry-run/cleanup.sh index 0174bbb..d5dc2a9 100755 --- a/2025-eScience/infrastructure/dry-run/cleanup.sh +++ b/2025-eScience/infrastructure/dry-run/cleanup.sh @@ -39,4 +39,4 @@ echo "Deleting the EKS cluster:" eksctl delete cluster --config-file ./eksctl-config.yaml --wait echo "" -echo "Everything is now cleaned up!" +echo "Everything is now cleaned up!" \ No newline at end of file diff --git a/2025-eScience/infrastructure/dry-run/cluster-autoscaler.yaml b/2025-eScience/infrastructure/dry-run/cluster-autoscaler.yaml index 3c884ae..4b0dcbf 100644 --- a/2025-eScience/infrastructure/dry-run/cluster-autoscaler.yaml +++ b/2025-eScience/infrastructure/dry-run/cluster-autoscaler.yaml @@ -269,4 +269,4 @@ spec: volumes: - name: ssl-certs hostPath: - path: "/etc/ssl/certs/ca-bundle.crt" + path: "/etc/ssl/certs/ca-bundle.crt" \ No newline at end of file diff --git a/2025-eScience/infrastructure/dry-run/config.toml b/2025-eScience/infrastructure/dry-run/config.toml index 1157a2d..6ca7b51 100644 --- a/2025-eScience/infrastructure/dry-run/config.toml +++ b/2025-eScience/infrastructure/dry-run/config.toml @@ -35,10 +35,11 @@ max_concurrent_users = 14 hub_password = "hpctutorial25" hub_db_capacity = "32Gi" ebs_storage_type = "gp3" +use_nginx = true hub_container_image = "jupyterhub/k8s-hub" hub_container_tag = "4.2.0" spawner_container_image = "ghcr.io/llnl/reproducible-benchmarking-spawn" -spawner_container_tag = "hpcic-2025" +spawner_container_tag = "escience-2025" spawner_image_entrypoint = "/entrypoint.sh 32" cpu_min = "32" cpu_max = "32" @@ -46,7 +47,7 @@ mem_min = "64G" mem_max = "64G" provide_extra_shmem = true init_container_image = "ghcr.io/llnl/reproducible-benchmarking-init" -init_container_tag = "hpcic-2025" +init_container_tag = "escience-2025" init_image_entrypoint = "/entrypoint.sh" [aws."utility scripts"] diff --git a/2025-eScience/infrastructure/dry-run/configure_kubernetes.sh b/2025-eScience/infrastructure/dry-run/configure_kubernetes.sh index 5c4bee6..f21c899 100755 --- a/2025-eScience/infrastructure/dry-run/configure_kubernetes.sh +++ b/2025-eScience/infrastructure/dry-run/configure_kubernetes.sh @@ -21,6 +21,9 @@ echo "Patching the cluster to make the configured storage class the default:" kubectl patch storageclass gp3 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' +echo "Adding nginx to EKS cluster using kubectl:" +kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.13.0/deploy/static/provider/cloud/deploy.yaml + echo "" echo "Done configuring Kubernetes!" echo "" diff --git a/2025-eScience/infrastructure/dry-run/deploy_jupyterhub.sh b/2025-eScience/infrastructure/dry-run/deploy_jupyterhub.sh index b7c91b2..c1e4bf8 100755 --- a/2025-eScience/infrastructure/dry-run/deploy_jupyterhub.sh +++ b/2025-eScience/infrastructure/dry-run/deploy_jupyterhub.sh @@ -37,4 +37,4 @@ echo "Note: users should have unique usernames. If two users have the same usern echo " share the same pod." echo "" echo "After you are done with your tutorial, you should finally run cleanup.sh to bring down" -echo "the EKS cluster and all associated resources." +echo "the EKS cluster and all associated resources." \ No newline at end of file diff --git a/2025-eScience/infrastructure/dry-run/eksctl-config.yaml b/2025-eScience/infrastructure/dry-run/eksctl-config.yaml index 7301cf7..4aec00f 100644 --- a/2025-eScience/infrastructure/dry-run/eksctl-config.yaml +++ b/2025-eScience/infrastructure/dry-run/eksctl-config.yaml @@ -61,10 +61,8 @@ availabilityZones: - "us-west-1a" - "us-west-1c" - # Define rules for nodegroups for each availability zone managedNodeGroups: - - name: node-group-us-west-1a # Set policies/permissions to autoscale iam: @@ -86,7 +84,6 @@ managedNodeGroups: tags: k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/jupyterhub: "owned" - - name: node-group-us-west-1c # Set policies/permissions to autoscale iam: diff --git a/2025-eScience/infrastructure/dry-run/get_jupyterhub_url.sh b/2025-eScience/infrastructure/dry-run/get_jupyterhub_url.sh index ddfd250..e7c105c 100755 --- a/2025-eScience/infrastructure/dry-run/get_jupyterhub_url.sh +++ b/2025-eScience/infrastructure/dry-run/get_jupyterhub_url.sh @@ -9,4 +9,4 @@ if ! command -v kubectl >/dev/null 2>&1; then exit 1 fi -kubectl get -o json service proxy-public | jq '.status.loadBalancer.ingress[0].hostname' \ No newline at end of file +kubectl --namespace ingress-nginx get -o json svc ingress-nginx-controller | jq '.status.loadBalancer.ingress[0].hostname' diff --git a/2025-eScience/infrastructure/dry-run/helm-config.yaml b/2025-eScience/infrastructure/dry-run/helm-config.yaml index 2a20e7b..987a48c 100644 --- a/2025-eScience/infrastructure/dry-run/helm-config.yaml +++ b/2025-eScience/infrastructure/dry-run/helm-config.yaml @@ -36,14 +36,14 @@ hub: # Define resource usage for JupyterHub # For large tutorials, it is recommended to set these higher - # We are just using defualt resource usage - # Define custom hostname for JupyterHub - # We are not using a custom hostname +ingress: + enabled: true + ingressClassName: "nginx" # Based on optimization recommendations from: # https://z2jh.jupyter.org/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders @@ -118,4 +118,3 @@ singleuser: extraVolumeMounts: - name: shm-volume mountPath: /dev/shm - diff --git a/2025-eScience/infrastructure/dry-run/tear_down_jupyterhub.sh b/2025-eScience/infrastructure/dry-run/tear_down_jupyterhub.sh index d0bb101..3fbb027 100755 --- a/2025-eScience/infrastructure/dry-run/tear_down_jupyterhub.sh +++ b/2025-eScience/infrastructure/dry-run/tear_down_jupyterhub.sh @@ -14,4 +14,4 @@ helm uninstall escience-2025-dry-run-jupyter echo "Helm's JupyterHub deployment is torn down." echo "If any attendee pods are remaining, you can delete them with 'kubectl delete pod '" echo "" -echo "To recreate the JupyterHub deployment, just run deploy_jupyterhub.sh again." +echo "To recreate the JupyterHub deployment, just run deploy_jupyterhub.sh again." \ No newline at end of file diff --git a/2025-eScience/infrastructure/dry-run/update_jupyterhub_deployment.sh b/2025-eScience/infrastructure/dry-run/update_jupyterhub_deployment.sh index dfdca5d..96bc7a5 100755 --- a/2025-eScience/infrastructure/dry-run/update_jupyterhub_deployment.sh +++ b/2025-eScience/infrastructure/dry-run/update_jupyterhub_deployment.sh @@ -11,4 +11,4 @@ fi helm upgrade escience-2025-dry-run-jupyter jupyterhub/jupyterhub --values ./helm-config.yaml -echo "The JupyterHub deployment is updated!" +echo "The JupyterHub deployment is updated!" \ No newline at end of file diff --git a/2025-eScience/infrastructure/production/README.md b/2025-eScience/infrastructure/production/README.md index ab35b5e..8b5c55f 100644 --- a/2025-eScience/infrastructure/production/README.md +++ b/2025-eScience/infrastructure/production/README.md @@ -111,4 +111,4 @@ $ ./cleanup.sh ``` After cleaning everything up, you can verify that everything has been cleaned up by going to the AWS web consle -and ensuring nothing from your tutorial still exists in CloudFormation and EKS. +and ensuring nothing from your tutorial still exists in CloudFormation and EKS. \ No newline at end of file diff --git a/2025-eScience/infrastructure/production/cleanup.sh b/2025-eScience/infrastructure/production/cleanup.sh index 0e2ecbd..782e7a0 100755 --- a/2025-eScience/infrastructure/production/cleanup.sh +++ b/2025-eScience/infrastructure/production/cleanup.sh @@ -39,4 +39,4 @@ echo "Deleting the EKS cluster:" eksctl delete cluster --config-file ./eksctl-config.yaml --wait echo "" -echo "Everything is now cleaned up!" +echo "Everything is now cleaned up!" \ No newline at end of file diff --git a/2025-eScience/infrastructure/production/cluster-autoscaler.yaml b/2025-eScience/infrastructure/production/cluster-autoscaler.yaml index fb5fc59..c696784 100644 --- a/2025-eScience/infrastructure/production/cluster-autoscaler.yaml +++ b/2025-eScience/infrastructure/production/cluster-autoscaler.yaml @@ -269,4 +269,4 @@ spec: volumes: - name: ssl-certs hostPath: - path: "/etc/ssl/certs/ca-bundle.crt" + path: "/etc/ssl/certs/ca-bundle.crt" \ No newline at end of file diff --git a/2025-eScience/infrastructure/production/config.toml b/2025-eScience/infrastructure/production/config.toml index 35ed978..7243c94 100644 --- a/2025-eScience/infrastructure/production/config.toml +++ b/2025-eScience/infrastructure/production/config.toml @@ -35,6 +35,7 @@ max_concurrent_users = 30 hub_password = "hpctutorial25" hub_db_capacity = "32Gi" ebs_storage_type = "gp3" +use_nginx = true hub_container_image = "jupyterhub/k8s-hub" hub_container_tag = "4.2.0" spawner_container_image = "ghcr.io/llnl/reproducible-benchmarking-spawn" diff --git a/2025-eScience/infrastructure/production/configure_kubernetes.sh b/2025-eScience/infrastructure/production/configure_kubernetes.sh index 5c4bee6..f21c899 100755 --- a/2025-eScience/infrastructure/production/configure_kubernetes.sh +++ b/2025-eScience/infrastructure/production/configure_kubernetes.sh @@ -21,6 +21,9 @@ echo "Patching the cluster to make the configured storage class the default:" kubectl patch storageclass gp3 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' +echo "Adding nginx to EKS cluster using kubectl:" +kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.13.0/deploy/static/provider/cloud/deploy.yaml + echo "" echo "Done configuring Kubernetes!" echo "" diff --git a/2025-eScience/infrastructure/production/deploy_jupyterhub.sh b/2025-eScience/infrastructure/production/deploy_jupyterhub.sh index 5a391e9..2debb70 100755 --- a/2025-eScience/infrastructure/production/deploy_jupyterhub.sh +++ b/2025-eScience/infrastructure/production/deploy_jupyterhub.sh @@ -37,4 +37,4 @@ echo "Note: users should have unique usernames. If two users have the same usern echo " share the same pod." echo "" echo "After you are done with your tutorial, you should finally run cleanup.sh to bring down" -echo "the EKS cluster and all associated resources." +echo "the EKS cluster and all associated resources." \ No newline at end of file diff --git a/2025-eScience/infrastructure/production/eksctl-config.yaml b/2025-eScience/infrastructure/production/eksctl-config.yaml index 6a7ea06..d561587 100644 --- a/2025-eScience/infrastructure/production/eksctl-config.yaml +++ b/2025-eScience/infrastructure/production/eksctl-config.yaml @@ -61,10 +61,8 @@ availabilityZones: - "us-east-1a" - "us-east-1b" - # Define rules for nodegroups for each availability zone managedNodeGroups: - - name: node-group-us-east-1a # Set policies/permissions to autoscale iam: @@ -86,7 +84,6 @@ managedNodeGroups: tags: k8s.io/cluster-autoscaler/enabled: "true" k8s.io/cluster-autoscaler/jupyterhub: "owned" - - name: node-group-us-east-1b # Set policies/permissions to autoscale iam: diff --git a/2025-eScience/infrastructure/production/get_jupyterhub_url.sh b/2025-eScience/infrastructure/production/get_jupyterhub_url.sh index ddfd250..e7c105c 100755 --- a/2025-eScience/infrastructure/production/get_jupyterhub_url.sh +++ b/2025-eScience/infrastructure/production/get_jupyterhub_url.sh @@ -9,4 +9,4 @@ if ! command -v kubectl >/dev/null 2>&1; then exit 1 fi -kubectl get -o json service proxy-public | jq '.status.loadBalancer.ingress[0].hostname' \ No newline at end of file +kubectl --namespace ingress-nginx get -o json svc ingress-nginx-controller | jq '.status.loadBalancer.ingress[0].hostname' diff --git a/2025-eScience/infrastructure/production/helm-config.yaml b/2025-eScience/infrastructure/production/helm-config.yaml index b259210..6fecb7a 100644 --- a/2025-eScience/infrastructure/production/helm-config.yaml +++ b/2025-eScience/infrastructure/production/helm-config.yaml @@ -36,14 +36,14 @@ hub: # Define resource usage for JupyterHub # For large tutorials, it is recommended to set these higher - # We are just using defualt resource usage - # Define custom hostname for JupyterHub - # We are not using a custom hostname +ingress: + enabled: true + ingressClassName: "nginx" # Based on optimization recommendations from: # https://z2jh.jupyter.org/en/latest/administrator/optimization.html#scaling-up-in-time-user-placeholders @@ -118,4 +118,3 @@ singleuser: extraVolumeMounts: - name: shm-volume mountPath: /dev/shm - diff --git a/2025-eScience/infrastructure/production/tear_down_jupyterhub.sh b/2025-eScience/infrastructure/production/tear_down_jupyterhub.sh index d748b69..e218c3b 100755 --- a/2025-eScience/infrastructure/production/tear_down_jupyterhub.sh +++ b/2025-eScience/infrastructure/production/tear_down_jupyterhub.sh @@ -14,4 +14,4 @@ helm uninstall escience-2025-tutorial-jupyter echo "Helm's JupyterHub deployment is torn down." echo "If any attendee pods are remaining, you can delete them with 'kubectl delete pod '" echo "" -echo "To recreate the JupyterHub deployment, just run deploy_jupyterhub.sh again." +echo "To recreate the JupyterHub deployment, just run deploy_jupyterhub.sh again." \ No newline at end of file diff --git a/2025-eScience/infrastructure/production/update_jupyterhub_deployment.sh b/2025-eScience/infrastructure/production/update_jupyterhub_deployment.sh index d2d2add..5c7e7cc 100755 --- a/2025-eScience/infrastructure/production/update_jupyterhub_deployment.sh +++ b/2025-eScience/infrastructure/production/update_jupyterhub_deployment.sh @@ -11,4 +11,4 @@ fi helm upgrade escience-2025-tutorial-jupyter jupyterhub/jupyterhub --values ./helm-config.yaml -echo "The JupyterHub deployment is updated!" +echo "The JupyterHub deployment is updated!" \ No newline at end of file