Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve Linux wrapper scripts #474

Closed
wants to merge 1 commit into from

Conversation

Arcitec
Copy link
Contributor

@Arcitec Arcitec commented Sep 23, 2024

Edit: Superceded by #477


  • Unify handling of environment variables.
  • Make all important variables configurable via environment rather than having to edit the script files.
  • Add support for custom Python command in all scripts.
  • Add support for venv directories with spaces in path name (especially useful when venv is on another disk).
  • Restructure a few things to more correct Bash code.

@Arcitec
Copy link
Contributor Author

Arcitec commented Sep 23, 2024

Things can be improved even further, but I don't have the energy for a full rewrite. This change at least gets the scripts to a unified level where they're much more helpful out of the box. :)

- Unify handling of environment variables.
- Make all important variables configurable via environment rather than having to edit the script files.
- Add support for custom Python command in all scripts.
- Add support for venv directories with spaces in path name (especially useful when venv is on another disk).
- Restructure a few things to more correct Bash code.
@Nerogar
Copy link
Owner

Nerogar commented Sep 23, 2024

There is another open PR already that tries to do the same thing. #466
Please check if your PR is still needed, or if they can be integrated in the other PR.

@Arcitec
Copy link
Contributor Author

Arcitec commented Sep 23, 2024

@Nerogar Ah. I had a 20 second look at the other PR now over breakfast and the code of that one looks really bad so far. It passes data through temp files on disk rather than "export" and it uses a bad way of fetching variabes. Was just the first glance where I read maybe 20% of the PR. Will have a closer look when I have free time.

Maybe I end up rewriting the whole scripts instead, with a lot bigger improvements and clean and robust code. I just didn't have time at first. There is so much that could be improved with a blank slate.

@Lolzen
Copy link
Contributor

Lolzen commented Sep 23, 2024

@Nerogar Ah. I had a 20 second look at the other PR now over breakfast and the code of that one looks really bad so far. It passes data through temp files on disk rather than "export" and it uses a bad way of fetching variabes. Was just the first glance where I read maybe 20% of the PR. Will have a closer look when I have free time.

Maybe I end up rewriting the whole scripts instead, with a lot bigger improvements and clean and robust code. I just didn't have time at first. There is so much that could be improved with a blank slate.

it stores one variable because it is used for the updates, in case a user chooses to use different environment than the check detects (NV, AMD, default).
the rest is given so we have one unified script which can be modified/worked upon, that is the idea as you adressed in your PR aswell, the scripts have been fragmented.

I'm not opposed if you implement it in any other way, just saying why stuff was done the way it was. Maybe reading things through and actually understand why things are how they are, would improve avoidance of duplication.

@Arcitec
Copy link
Contributor Author

Arcitec commented Sep 24, 2024

@Lolzen Hey thanks for adding some details! I can see the use-case for specifying a different requirements.txt file than what the check can auto-detect, so I'll be sure to add that feature too. :) But as an environment variable rather than an on-disk file.

I'm going to tag you in the updated version of this PR so you can provide feedback to see if there's any other feature you think we need. I'm starting the rewrite without looking at either the original scripts or any other changes, just to start from a clean slate with best-practices. Then I'll bring in features from the original scripts + similar changes as this PR + your idea for a requirements.txt override. And then we can see where we go from there. :)

Edit: After looking at OneTrainer's current scripts, it's pretty impressive... how there's practically not a single thing correct about how the old scripts were written.

  • No error checking whatsoever, so if a command failed, the old scripts just happily continued with the next line.
  • Or even funny things like "checking Python version on the host and failing if the host's Python is wrong, instead of checking inside the Conda/Venv environment"... which was totally insane...
  • Or "checking if a Conda environment exists, by searching for 'ot' anywhere in the output of the conda info --envs command, meaning that if there's an env notcorrect /home/otheruser/foo then it would match the old ot check, ffs. :D
  • Or checking for "CUDA" by looking for a 10 year old binary (nvcc) that no longer exists in NVIDIA's driver, thereby failing to detect CUDA on all systems.
  • Or the fact that it fails to detect Conda if the conda shell startup script/hook has been executed.
  • Or starting Conda via an absolutely insane, hacky and very incorrect command: bash --init-file <(echo ". \"$HOME/.bashrc\"; conda activate $conda_env; python scripts/train_ui.py")
  • Or the way update.sh was not doing --force-reinstall for Venv (only for Conda)...
  • Or the fact that --force-reinstall is total nonsense and doesn't upgrade pip packages (it just reinstalls what you already have on disk). That's not how to upgrade packages to the latest versions. :)
  • Or not handling the working-directory at all, meaning that the user had to manually cd OneTrainer before being able to run any of the .sh scripts.
  • Or the very funny way it was checking the Python version by checking each version component one by one (major and then minor) and printing 2 subtly different, duplicated error messages for the two situations, instead of just checking both at once.
  • Or the way it was recommending the huge, 3+ gigabyte Conda which contains around 2000 scientific libraries pre-installed, when Miniconda is way better (it's just the package manager, which can install what you need on-demand instead of installing tons of bloat you don't need).

@Lolzen
Copy link
Contributor

Lolzen commented Sep 24, 2024

@Arcitec awesome!
I did the scripts as best to my knowledge as possible. By trial-and-error i basically worked them into "functioning", as no one else did provide Linux supporded scripts at that time, and people committed additions for their valid user-cases. I'm glad if you can improve them.

As my first and main language is Lua, i'm way more versed therein, so i am pleased if you can improve the scripts in whatever means. Thanks.

@Lolzen
Copy link
Contributor

Lolzen commented Sep 25, 2024

  • Or starting Conda via an absolutely insane, hacky and very incorrect command: bash --init-file <(echo ". \"$HOME/.bashrc\"; conda activate $conda_env; python scripts/train_ui.py")

couldn't get to enter a conda env shell otherwise

Or the way update.sh was not doing --force-reinstall for Venv (only for Conda)...

was adressed in the closed PR, simple oversight.

Or the fact that --force-reinstall is total nonsense and doesn't upgrade pip packages (it just reinstalls what you already have on disk). That's not how to upgrade packages to the latest versions. :)

not generally, but if a git pull is done before, and the requirements are updated, it should force to reinstall the newer versions. Oriented by how sd-ui-forge handled it.

Or not handling the working-directory at all, meaning that the user had to manually cd OneTrainer before being able to run any of the .sh scripts.

And on Win you have to doubleclick the .bat files. Now that's nitpicking.

Or the very funny way it was checking the Python version by checking each version component one by one (major and then minor) and printing 2 subtly different, duplicated error messages for the two situations, instead of just checking both at once.

Python2 was still the default in e.g. Ubuntu

@Arcitec
Copy link
Contributor Author

Arcitec commented Sep 26, 2024

couldn't get to enter a conda env shell otherwise

You should never enter a conda shell from a shell script. That's meant for interactive sessions where the user is typing.

not generally, but if a git pull is done before, and the requirements are updated, it should force to reinstall the newer versions. Oriented by how sd-ui-forge handled it.

It doesn't. Doing it that way just reinstalls the latest, cached requirements on disk as long as they still satisfy the requirements, meaning it does not update requirements.

Looking at other project's mistakes can often lead to accidents like this.

And on Win you have to doubleclick the .bat files. Now that's nitpicking.

On Mac and Linux, we don't double-click shell scripts most of the time. Most desktop environments aren't even configured to handle double-clicking of shell scripts at all. We have a terminal open and run the script, often from a path that's not directly inside the project directory. So one of the core things to do in a shell script is to first ensure that the working-directory is the project directory so that all resources can be found.

Python2 was still the default in e.g. Ubuntu

That's unrelated to how the check was being done.

Anyway, closing this now. Will submit the complete overhaul pull request in a moment. :)

@Arcitec Arcitec closed this Sep 26, 2024
@Nerogar
Copy link
Owner

Nerogar commented Sep 26, 2024

I can't comment on most of these things. I'm using Windows and don't have much experience with Linux.
Just one thing: --force-reinstall was added to the update command to ensure that all git dependencies are actually updated. In the past, they were not always updated, even when specifying --upgrade.

@Arcitec
Copy link
Contributor Author

Arcitec commented Sep 26, 2024

Just one thing: --force-reinstall was added to the update command to ensure that all git dependencies are actually updated.

There are two issues with that. The first is that it only downloads newer dependencies if the constraints have changed in a way that forces an upgrade (i.e. numpy==1.12 becoming numpy==1.13), otherwise it keeps older versions from the cache instead. The second issue is that doing it that way reinstalls everything even when there are zero changes, which takes a lot of time.

In the past, they were not always updated, even when specifying --upgrade.

Yeah, by default, pip's upgrade is very conservative and will not upgrade anything that's already installed (if requirements.txt hasn't changed its constraints). The correct command is to tell it to eagerly upgrade to the latest compatible version of each dependency.

The command for that is:

python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r morerequirements.txt

This achieves two things:

  1. All dependencies are upgraded to the same versions as a totally fresh install would have.
  2. It only updates dependencies that aren't already the latest version, saving tons of time. Taking a few seconds instead of minutes.

Furthermore, that exact command can be used for the initial fresh install too. There's no need to separate the upgrade vs install commands. Both can use this exact pip command. If they don't have any packages yet, it will install them as usual. If they had a few outdated packages already installed, it will ensure that they get upgrades during the "fresh install" too. So it's a win-win.

I also recommend always running python -m pip install --upgrade --upgrade-strategy eager pip setuptools as a separate call before installing requirements. This ensures that you have the latest pip dependency resolver, which often contains bugfixes. It must be done as a separate step, otherwise it won't apply while requirements.txt is being installed.

The Windows batch scripts could be improved with these changes as well. :)


Edit: Here's an exact example of the commands that are being executed by the new Mac/Linux scripts:

In Venv:

[OneTrainer] + python -m pip install --upgrade --upgrade-strategy eager pip setuptools
[OneTrainer] + python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r requirements-cuda.txt

In Conda (it's the exact same command as Venv, but executed within conda run -n onetrainer --no-capture-output):

[OneTrainer] + /home/johnny/.local/share/miniconda/bin/conda run -n onetrainer --no-capture-output python -m pip install --upgrade --upgrade-strategy eager pip setuptools
[OneTrainer] + /home/johnny/.local/share/miniconda/bin/conda run -n onetrainer --no-capture-output python -m pip install --upgrade --upgrade-strategy eager -r requirements-global.txt -r requirements-cuda.txt

@O-J1
Copy link
Collaborator

O-J1 commented Sep 26, 2024

Does this mean reinstalling Torch will be a thing of the past?

@Arcitec
Copy link
Contributor Author

Arcitec commented Sep 26, 2024

Does this mean reinstalling Torch will be a thing of the past?

lol, yes. :)

@Arcitec Arcitec deleted the improve-wrappers branch September 26, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants