new Mac/Linux launch script framework: modular, extensible and robust (…

…#477)
Nerogar · Sep 29, 2024 · 2da3d15 · 2da3d15
1 parent d10d86e
commit 2da3d15
Show file tree

Hide file tree

Showing 9 changed files with 473 additions and 179 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,6 @@
 .idea
+.python-version
+/.venv*
 /venv*
 /debug*
 /workspace*

diff --git a/LAUNCH-SCRIPTS.md b/LAUNCH-SCRIPTS.md
@@ -0,0 +1,68 @@
+# OneTrainer Launch Scripts
+
+
+## Mac and Linux Systems
+
+### The launch system consists of the following scripts:
+
+- `install.sh`: Ensures that you have a valid Python runtime environment and installs all requirements if necessary.
+- `update.sh`: Updates OneTrainer to the latest version and upgrades any outdated requirements in your Python runtime environment.
+- `start-ui.sh`: Launches the main OneTrainer interface.
+- `run-cmd.sh`: Executes a custom script (such as "train"), and supports providing command-line arguments. See the "running custom script commands" guide section for more details.
+
+
+### All of the scripts accept the following *optional* environment variables to customize their behavior:
+
+- `OT_CONDA_CMD`: Sets a custom Conda command or an absolute path to the binary (useful when it isn't in the user's `PATH`). If nothing is provided, we detect and use `CONDA_EXE` if available, which is a variable that's set by Conda itself and always points at the user's installed Conda binary.
+
+- `OT_CONDA_ENV`: Sets the name of the Conda environment. Defaults to `onetrainer`.
+
+- `OT_PYTHON_CMD`: Sets the Host's Python executable. It's used for creating the Python Venvs. This can be used to force the usage of a specific Python version's binary (such as `python3.10`) whenever the host has multiple versions installed. However, it's *always* recommended to use Conda or Pyenv instead, rather than relying on the host's unreliable system-wide Python binaries (which might change or be removed with system updates), so we don't recommend changing this option unless you *really* know what you're doing. Defaults to `python`.
+
+- `OT_PYTHON_VENV`: Sets the name (or an absolute/relative path) of the Python Venv. If a name or relative path is used, it will be relative to the OneTrainer directory. Defaults to `venv`.
+
+- `OT_PREFER_VENV`: If set to `true`, Conda will be ignored even if it exists on the system, and Python Venv will be used instead. This ensures that people who use `pyenv` (to choose which Python version to run on the host) can easily set up their desired Python Venv environments. Defaults to `false`.
+
+- `OT_CUDA_LOWMEM_MODE`: If set to `true`, it enables aggressive garbage collection in PyTorch to help with low-memory GPUs. Defaults to `false`.
+
+- `OT_PLATFORM_REQUIREMENTS`: Allows you to override which platform-specific "requirements" file you want to install. Defaults to `detect`, which automatically detects whether you have an AMD or NVIDIA GPU. But people with multi-GPU systems can use this setting to force a specific GPU acceleration framework's requirements. Valid values are `requirements-rocm.txt` for AMD, `requirements-cuda.txt` for NVIDIA, and `requirements-default.txt` for non-AMD/NVIDIA systems.
+
+- `OT_SCRIPT_DEBUG`: If set to `true`, it enables additional debug logging in the scripts. Defaults to `false`.
+
+
+### Examples of how to use the custom environment variables:
+
+- You can provide custom environment variables directly on the command line, as follows: `env OT_PREFER_VENV="true" OT_CUDA_LOWMEM_MODE="true" OT_PLATFORM_REQUIREMENTS="requirements-cuda.txt" ./start-ui.sh`.
+- You can add them to your user's persistent environment variables, so that they are always active. The process varies depending on your operating system. On Linux, you can place them in `~/.config/environment.d/onetrainer.conf` (on all Systemd-based distros), which is a plaintext file with *one variable per line,* such as `OT_CUDA_LOWMEM_MODE="true"`. Beware that changes to `environment.d` requires a *complete system restart* to take effect (there is no command for reloading them live). To verify that your environment has been set persistently, you can then open a terminal window and run `printenv <variable name>` (such as `printenv OT_CUDA_LOWMEM_MODE`) to see if your custom values have taken effect.
+- If you're launching OneTrainer from your own, custom scripts, then you can instead `export` the new values (which tells the shell to pass those environment variables onto child processes). For example, by having a line such as `export OT_CUDA_LOWMEM_MODE="true"` before your script calls `./OneTrainer/start-ui.sh`.
+- If you're running OneTrainer inside a Docker/Podman container, you can instead use the [ENV](https://docs.docker.com/reference/dockerfile/#env) instruction in your `Dockerfile` / `Containerfile` to set the variables, such as `ENV OT_CUDA_LOWMEM_MODE="true"`.
+
+
+### Installing the required Python version for OneTrainer:
+
+- If you've received a warning that your system's Python version is incorrect, then your system doesn't have Conda installed, and has instead tried to create a Python Venv with your host's default Python version. If that version is incompatible with OneTrainer, then you'll have to resolve the problem by manually installing a compatible version.
+- Begin by deleting the `venv` sub-directory inside the OneTrainer directory, to erase the invalid Python Venv (which was created with the wrong Python version).
+- Now you'll have to choose which solution you prefer.
+- The most beginner-friendly solution is to install [Miniconda](https://docs.anaconda.com/miniconda/) on your system. OneTrainer will then automatically install and manage the correct Python version for you. You can stop reading here if you're choosing this solution. Everything will work automatically after that.
+- Alternatively, if you prefer a more lightweight and advanced solution, then you can use [pyenv](https://github.com/pyenv/pyenv), which allows you to set the exact Python version to use for OneTrainer's directory. If you're on Linux, then read their "[automatic installer](https://github.com/pyenv/pyenv?tab=readme-ov-file#automatic-installer)" section and follow the instructions. If you're on a Mac instead, then read their "[Homebrew](https://github.com/pyenv/pyenv?tab=readme-ov-file#homebrew-in-macos)" section (which is an open-source package manager for Macs).
+- After installing pyenv, you will also need to install the [Python build dependencies](https://github.com/pyenv/pyenv/wiki#suggested-build-environment) on your system, since pyenv installs each Python version by compiling them directly from the official source code.
+- Restart your shell, and then try the `pyenv doctor` command, which ensures that pyenv is loaded and verifies that your system contains all required dependencies for installing Python.
+- Run `pyenv install <python version>` to install whichever Python version is currently required by OneTrainer. You can look at the `OT_CONDA_USE_PYTHON_VERSION` variable at the top of the `lib.include.sh` file in OneTrainer's project directory, to see which Python version is recommended by OneTrainer at the moment.
+- Lastly, you must navigate to the OneTrainer directory, and then run `pyenv local <python version>` to force OneTrainer to use that version of Python. Your choice will be stored persistently in the hidden `.python-version` file, and can be changed again in the future by running the command again.
+- You can now run `python --version` to verify that the `python` command in OneTrainer is being mapped to the correct Python version by pyenv.
+- Everything is now ready for running OneTrainer!
+
+
+### Running custom script commands:
+
+- Always use `run-cmd.sh` when you want to execute any of OneTrainer's CLI tasks. It automatically validates the chosen target script's name, configures the runtime environment correctly, and then runs the target script with your given command-line arguments.
+- For example, to run the training CLI script, you would use `./run-cmd.sh train --config-path <path to your config>`.
+- The names of all valid scripts can be seen in OneTrainer's `scripts/` sub-directory.
+- To learn more about the available command-line arguments for each script, you can execute them with the `-h` (help) argument: `./run-cmd.sh <script name> -h`. For example, if you want to learn more about the "train" script, you would run `./run-cmd.sh train -h`.
+
+
+### Creating your own launch scripts and automating tasks:
+
+- If you want to automate various OneTrainer CLI tasks, then you should call `run-cmd.sh` from your own scripts (see previous guide section), since it's capable of running *any* OneTrainer command with your own command-line arguments.
+- To run multiple tasks in the same scripts, you should perform separate calls to `run-cmd.sh`. Run it as many times as required for all the custom scripts and command-line arguments that you want to perform in your own script.
+- It's highly recommended that you use `set -e` at the top of your own scripts (see `install.sh` for an example of that), since it tells Bash to exit your script if any of the OneTrainer commands fail. Otherwise your script will continue running even if a previous step has failed, which is usually not what you want!
diff --git a/README.md b/README.md
@@ -95,7 +95,9 @@ All functionality is split into different scrips located in the `scripts` direct
 - `generate_masks.py` A utility to automatically create masks for your dataset
 - `calculate_loss.py` A utility to calculate the training loss of every image in your dataset
 
-To learn more about the different parameters, execute `<scipt-name> -h`. For example `python scripts\train.py -h`
+To learn more about the different parameters, execute `<script-name> -h`. For example `python scripts\train.py -h`
+
+If you are on Mac or Linux, read [the launch script documentation](LAUNCH-SCRIPTS.md) for detailed information about how to run OneTrainer and its various scripts on your system.
 
 ## Contributing
 

diff --git a/install.sh b/install.sh
@@ -1,67 +1,7 @@
-#!/bin/bash
+#!/usr/bin/env bash
 
-# Let user specify python and venv directly.
-if [[ -z "${python_cmd}" ]]; then
-    python_cmd="python"
-fi
-if [[ -z "${python_venv}" ]]; then
-    python_venv=venv
-fi
+set -e
 
-#change the environment name for conda to use
-conda_env=ot
-#change the environment name for python to use (only needed if Anaconda3 or miniconda is not installed)
+source "${BASH_SOURCE[0]%/*}/lib.include.sh"
 
-if [ -e /dev/kfd ]; then
-	PLATFORM_REQS=requirements-rocm.txt
-elif [ -x "$(command -v nvcc)" ]; then
-	PLATFORM_REQS=requirements-cuda.txt
-else
-	PLATFORM_REQS=requirements-default.txt
-fi
-
-if ! [ -x "$(command -v ${python_cmd})" ]; then
-	echo 'error: python not installed or found!'
-elif [ -x "$(command -v ${python_cmd})" ]; then
-	major=$(${python_cmd} -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(major)')
-	minor=$(${python_cmd} -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(minor)')
-
-	#check major version of python
-	if [[ "$major" -eq "3" ]];
-		then
-			#check minor version of python
-			if [[ "$minor" -le "10" ]];
-				then
-					if ! [ -x "$(command -v conda)" ]; then
-						echo 'conda not found; python version correct; use native python'
-						if ! [ -d $python_venv ]; then
-							${python_cmd} -m venv $python_venv
-						fi
-						source $python_venv/bin/activate
-						if [[ -z "$VIRTUAL_ENV" ]]; then
-    							echo "warning: No VIRTUAL_ENV set. exiting."
-						else
-    							${python_cmd} -m pip install -r requirements-global.txt -r $PLATFORM_REQS
-						fi
-					elif [ -x "$(command -v conda)" ]; then
-						#check for venv
-						if conda info --envs | grep -q ${conda_env}; 
-							then
-								bash --init-file <(echo ". \"$HOME/.bashrc\"; conda activate $conda_env; ${python_cmd} -m pip install -r requirements-global.txt -r $PLATFORM_REQS")
-							else 
-								conda create -y -n $conda_env python==3.10;
-								bash --init-file <(echo ". \"$HOME/.bashrc\"; conda activate $conda_env; ${python_cmd} -m pip install -r requirements-global.txt -r $PLATFORM_REQS")
-						fi
-					fi
-				else
-					echo 'error: wrong python version installed:'$major'.'$minor
-					echo 'OneTrainer requires the use of python 3.10, please refer to the anaconda project to setup a virtual environment with that version. https://anaconda.org/anaconda/python'
-			fi
-		else
-			echo 'error: wrong python version installed:'$major'.'$minor
-			echo 'OneTrainer requires the use of python 3.10, either install python3 on your system or refer to the anaconda project to setup a virtual environment with that version. https://anaconda.org/anaconda/python'
-	fi
-fi
-
-#create workdirs
-#TODO
+prepare_runtime_environment