Skip to content

fix: make twoliter work in parallel invocation scenarios #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

mikn
Copy link
Contributor

@mikn mikn commented Mar 6, 2025

Issue number:

Closes #477

Description of changes: Fixed race conditions in Twoliter's verification tag system and tools directory installation to prevent corruption when multiple concurrent processes attempt to manipulate these resources:

  • Generally introduces a file lock mechanism to get exclusive access when doing common file manipulations, including backoff and stale lock management

Verification Tag System Improvements:

  1. Introduces content-based hash comparison to avoid unnecessary rewrites of tag files
  2. Uses said file locking mechanism to ensure atomicity of tag verification
  3. Eliminates redundant cleanup calls in project loading methods
  4. Preserves the original API (cleanup_existing_tags) while improving its implementation

Tools Directory Installation Improvements:

  1. Makes tools installation atomic through above mentioned file-based locking mechanism
  2. Creates tools in a temporary directory with a UUID suffix to prevent partial updates
  3. Safely performs the critical section operations (removing existing directory, renaming temp directory)

The changes (hopefully?) maintain backward compatibility while making Twoliter more robust in environments where multiple processes run in parallel (e.g., Bazel builds).

Testing done:

  1. Added unit tests specifically targeting verification tag operations:
    1. Content-based skipping (verifies unchanged files aren't rewritten)
    2. Concurrent tag writing (validates thread/process safety)
    3. Cleanup functionality (ensures complete and safe tag removal)
  2. Included tests for tools installation to verify atomicity and correctness
  3. Actually ran twoliter with this patch in Bazel with multiple in-parallel variant builds

After this patch, it now looks like this:

[2025-03-06T16:41:33Z WARN  twoliter__bin::project] A Release.toml file was found. Release.toml is deprecated. Please remove it from your project.
[2025-03-06T16:41:33Z DEBUG twoliter__bin::project] Project file loaded from '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/Twoliter.toml'
[2025-03-06T16:41:33Z INFO  twoliter__bin::project::lock] Resolving project references to check against lock file
[2025-03-06T16:41:33Z DEBUG twoliter__bin::project::lock] Loading existing lockfile '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/Twoliter.lock'
[2025-03-06T16:41:33Z DEBUG twoliter__bin::project::lock] Resolving kit 'bottlerocket-core-kit' [email protected]/bottlerocket/bottlerocket-core-kit:v5.4.2
[2025-03-06T16:41:33Z INFO  twoliter__bin::project::lock::image] Resolving dependency image dependency '[email protected]/bottlerocket/bottlerocket-core-kit:v5.4.2'.
[2025-03-06T16:41:33Z DEBUG twoliter__bin::project::lock::image] Fetching image manifest. [email protected]/bottlerocket/bottlerocket-core-kit:v5.4.2 uri="public.ecr.aws/bottlerocket/bottlerocket-core-kit:v5.4.2"
[2025-03-06T16:41:36Z DEBUG twoliter__bin::project::lock::image] Calculated digest for locked image 'public.ecr.aws/bottlerocket/bottlerocket-core-kit:v5.4.2': '08DMHssgSYSdDW2RPw8Oxy+8/jsNA76sCRW0kJG8XNE='
[2025-03-06T16:41:36Z DEBUG twoliter__bin::project::lock::image] Extracting kit metadata from OCI image
[2025-03-06T16:41:38Z DEBUG twoliter__bin::project::lock] Resolving kit 'bottlerocket-kernel-kit' [email protected]/bottlerocket/bottlerocket-kernel-kit:v1.0.6
[2025-03-06T16:41:38Z INFO  twoliter__bin::project::lock::image] Resolving dependency image dependency '[email protected]/bottlerocket/bottlerocket-kernel-kit:v1.0.6'.
[2025-03-06T16:41:38Z DEBUG twoliter__bin::project::lock::image] Fetching image manifest. [email protected]/bottlerocket/bottlerocket-kernel-kit:v1.0.6 uri="public.ecr.aws/bottlerocket/bottlerocket-kernel-kit:v1.0.6"
[2025-03-06T16:41:41Z DEBUG twoliter__bin::project::lock::image] Calculated digest for locked image 'public.ecr.aws/bottlerocket/bottlerocket-kernel-kit:v1.0.6': 'xnixe91yqOdD0iDPSwa0fKnAaCDzU6nwzw1XL+3pbyI='
[2025-03-06T16:41:41Z DEBUG twoliter__bin::project::lock::image] Extracting kit metadata from OCI image
[2025-03-06T16:41:44Z DEBUG twoliter__bin::project::lock] Resolving workspace SDK sdk_set={ProjectImage { image: Image { name: ValidIdentifier("bottlerocket-sdk"), version: Version { major: 0, minor: 50, patch: 1 }, vendor: ValidIdentifier("bottlerocket") }, vendor: Verbatim(VerbatimVendor { vendor_name: ValidIdentifier("bottlerocket"), vendor: Vendor { registry: "public.ecr.aws/bottlerocket" } }) }}
[2025-03-06T16:41:44Z DEBUG twoliter__bin::project::lock] Resolving workspace SDK sdk=ProjectImage { image: Image { name: ValidIdentifier("bottlerocket-sdk"), version: Version { major: 0, minor: 50, patch: 1 }, vendor: ValidIdentifier("bottlerocket") }, vendor: Verbatim(VerbatimVendor { vendor_name: ValidIdentifier("bottlerocket"), vendor: Vendor { registry: "public.ecr.aws/bottlerocket" } }) }
[2025-03-06T16:41:44Z INFO  twoliter__bin::project::lock::image] Resolving dependency image dependency '[email protected]/bottlerocket/bottlerocket-sdk:v0.50.1'.
[2025-03-06T16:41:44Z DEBUG twoliter__bin::project::lock::image] Fetching image manifest. [email protected]/bottlerocket/bottlerocket-sdk:v0.50.1 uri="public.ecr.aws/bottlerocket/bottlerocket-sdk:v0.50.1"
[2025-03-06T16:41:46Z DEBUG twoliter__bin::project::lock::image] Calculated digest for locked image 'public.ecr.aws/bottlerocket/bottlerocket-sdk:v0.50.1': 'HEh3Lx3F6P4OEPnFubF++RMpMW2vlfp/Tc/tGjnBRcM='
[2025-03-06T16:41:46Z DEBUG twoliter__bin::project::lock] Comparing resolved lock to current lock state current_lock=Lock { schema_version: 1, sdk: LockedImage { name: ValidIdentifier("bottlerocket-sdk"), version: Version { major: 0, minor: 50, patch: 1 }, vendor: ValidIdentifier("bottlerocket"), source: "public.ecr.aws/bottlerocket/bottlerocket-sdk:v0.50.1", digest: "HEh3Lx3F6P4OEPnFubF++RMpMW2vlfp/Tc/tGjnBRcM=" }, kit: [LockedImage { name: ValidIdentifier("bottlerocket-core-kit"), version: Version { major: 5, minor: 4, patch: 2 }, vendor: ValidIdentifier("bottlerocket"), source: "public.ecr.aws/bottlerocket/bottlerocket-core-kit:v5.4.2", digest: "08DMHssgSYSdDW2RPw8Oxy+8/jsNA76sCRW0kJG8XNE=" }, LockedImage { name: ValidIdentifier("bottlerocket-kernel-kit"), version: Version { major: 1, minor: 0, patch: 6 }, vendor: ValidIdentifier("bottlerocket"), source: "public.ecr.aws/bottlerocket/bottlerocket-kernel-kit:v1.0.6", digest: "xnixe91yqOdD0iDPSwa0fKnAaCDzU6nwzw1XL+3pbyI=" }] } resolved_lock=Lock { schema_version: 1, sdk: LockedImage { name: ValidIdentifier("bottlerocket-sdk"), version: Version { major: 0, minor: 50, patch: 1 }, vendor: ValidIdentifier("bottlerocket"), source: "public.ecr.aws/bottlerocket/bottlerocket-sdk:v0.50.1", digest: "HEh3Lx3F6P4OEPnFubF++RMpMW2vlfp/Tc/tGjnBRcM=" }, kit: [LockedImage { name: ValidIdentifier("bottlerocket-core-kit"), version: Version { major: 5, minor: 4, patch: 2 }, vendor: ValidIdentifier("bottlerocket"), source: "public.ecr.aws/bottlerocket/bottlerocket-core-kit:v5.4.2", digest: "08DMHssgSYSdDW2RPw8Oxy+8/jsNA76sCRW0kJG8XNE=" }, LockedImage { name: ValidIdentifier("bottlerocket-kernel-kit"), version: Version { major: 1, minor: 0, patch: 6 }, vendor: ValidIdentifier("bottlerocket"), source: "public.ecr.aws/bottlerocket/bottlerocket-kernel-kit:v1.0.6", digest: "xnixe91yqOdD0iDPSwa0fKnAaCDzU6nwzw1XL+3pbyI=" }] }
[2025-03-06T16:41:46Z DEBUG twoliter__bin::common] Acquired lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.verification.lock
[2025-03-06T16:41:46Z DEBUG twoliter__bin::project::lock::verification] Tag file '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.sdk-verified' unchanged, skipping
[2025-03-06T16:41:46Z DEBUG twoliter__bin::project::lock::verification] Tag file '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.kits-verified' unchanged, skipping
[2025-03-06T16:41:46Z DEBUG twoliter__bin::common] Releasing lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.verification.lock
[2025-03-06T16:41:46Z DEBUG twoliter__bin::tools] Installing tools to '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools'
[2025-03-06T16:41:46Z DEBUG twoliter__bin::tools] Installed tools to '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools.tmp.5f1578d0-ddaa-4df4-81a6-11409e5b3a3e'
[2025-03-06T16:41:46Z DEBUG twoliter__bin::common] Acquired lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/.tools.lock
[2025-03-06T16:41:46Z DEBUG twoliter__bin::tools] Successfully installed tools to '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools'
[2025-03-06T16:41:46Z DEBUG twoliter__bin::common] Releasing lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/.tools.lock
[2025-03-06T16:41:46Z DEBUG twoliter__bin::common] Running: Command { std: "cargo" "make" "--disable-check-for-updates" "--makefile" "/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools/Makefile.toml" "--cwd" "/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket" "-e=TLPRIVATE_SDK_IMAGE=public.ecr.aws/bottlerocket/bottlerocket-sdk:v0.50.1" "-e=BUILDSYS_OUTPUT_GENERATION_ID=1" "-e=TWOLITER_TOOLS_DIR=/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools" "-e=BUILDSYS_ARCH=x86_64" "-e=BUILDSYS_VARIANT=metal-dev" "-e=BUILDSYS_VERSION_IMAGE=1.1.0" "-e=GO_MODULES=" "-e=BUILDSYS_UPSTREAM_SOURCE_FALLBACK=false" "build", kill_on_drop: false }

I do understand the thinking behind removing the lock files early on, but there seems to be no actions that you have that checks for their existence before or without actually verifying them, so I do not think it is something worth guarding against. This is quite a lot of cowboy coding though, so feel free to correct me if I have misunderstood something!

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@mikn
Copy link
Contributor Author

mikn commented Mar 7, 2025

So yeah, perhaps "obviously" we hit the race of the tools/ folder being thrashed also (i.e one of the processes tried to read a file that was subsequently removed before the read actually happened). The most robust solution is to do a hash based comparison of the contents of the folder. Again, this is a bit rough but it does the job.

I guess a question for you guys is if you'd prefer me unwrapping twoliter at this point and reimplement what you do in the Makefile.toml and call each individual tool instead of (ab)using twoliter like this?

@cbgbt
Copy link
Contributor

cbgbt commented May 2, 2025

Hey @mikn sorry for the silence here. This has been competing with other priorities of mine, and I know the parallelization concept will require a lot of attention, so I want to make sure I can give this the focus it deserves.

We will work to make some space to review this soon! Thanks for submitting this.

@mikn
Copy link
Contributor Author

mikn commented May 13, 2025

It's completely fine! I have myself been drowning in other work. I finally am updating all of the BR tooling on our side again, trying to upstream as much as possible to slim down our footprint and I once again, after having seen the very generous changes made by @jmt-lab here e616ed0
To switch to twoliter 0.10 on our end.

Ironically, I actually hit parallelization troubles here already:
https://github.com/bottlerocket-os/twoliter/blob/develop/twoliter/embedded/Makefile.toml#L993

because this is the path (when you run "build variant" on the twoliter side): https://github.com/bottlerocket-os/twoliter/blob/develop/twoliter/embedded/Makefile.toml#L1015

Here is the outtake:

[2025-05-13T06:47:57Z DEBUG twoliter__bin::common] Acquired lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.verification.lock
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common::content] Content hash match for '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.sdk-verified': hash=fe164a737c73b24
[2025-05-13T06:47:57Z DEBUG twoliter__bin::project::lock::verification] Tag file '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.sdk-verified' unchanged, skipping
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common::content] Content hash match for '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.kits-verified': hash=68d94eb1488f985a
[2025-05-13T06:47:57Z DEBUG twoliter__bin::project::lock::verification] Tag file '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.kits-verified' unchanged, skipping
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common] Releasing lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/external-kits/.verification.lock
[2025-05-13T06:47:57Z DEBUG twoliter__bin::tools] Installing tools to '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools'
[2025-05-13T06:47:57Z DEBUG twoliter__bin::tools] Unpacked tarball to '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools.tmp.e77de432-249f-45cc-af0e-8d361a78df00'
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common] Acquired lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/.tools.lock
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common::content] Comparing directories: '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools.tmp.e77de432-249f-45cc-af0e-8d361a78df00' and '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools'
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common::content] Directories are identical: 25 files with matching content
[2025-05-13T06:47:57Z INFO  twoliter__bin::tools] Tools directory content matches, skipping update
[2025-05-13T06:47:57Z DEBUG twoliter__bin::tools] Tools in '/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools' are up to date, no changes made
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common] Releasing lock: /home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/.tools.lock
[2025-05-13T06:47:57Z DEBUG twoliter__bin::common] Running: Command { std: "cargo" "make" "--disable-check-for-updates" "--makefile" "/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools/Makefile.toml" "--cwd" "/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket" "-e=TLPRIVATE_SDK_IMAGE=public.ecr.aws/bottlerocket/bottlerocket-sdk:v0.50.1" "-e=BUILDSYS_OUTPUT_GENERATION_ID=1" "-e=TWOLITER_TOOLS_DIR=/home/mikn/.cache/bazel/_bazel_mikn/2c3f33a86f4a406b8ffa9b29662db437/execroot/_main/platform/base/bottlerocket/build/tools" "-e=BUILDSYS_ARCH=x86_64" "-e=BUILDSYS_VARIANT=metal-k8s-1.31-control" "-e=BUILDSYS_VERSION_IMAGE=1.2.1" "-e=GO_MODULES=" "-e=BUILDSYS_UPSTREAM_SOURCE_FALLBACK=false" "build", kill_on_drop: false }
2025-05-13 06:47:59 [ERROR] failed to fetch crates:     Updating git repository `https://github.com/bottlerocket-os/bottlerocket-settings-sdk`
error: failed to get `bottlerocket-defaults-helper` as a dependency of package `settings-defaults-metal-dev v0.3.0 (/tmp/sources/settings-defaults/metal-dev)`

Caused by:
  failed to load source for dependency `bottlerocket-defaults-helper`

Caused by:
  Unable to update https://github.com/bottlerocket-os/bottlerocket-settings-sdk?tag=bottlerocket-defaults-helper-v0.1.1#0cf85735

Caused by:
  failed to create directory `/tmp/.cargo/git/db/bottlerocket-settings-sdk-3ffd24dec00e96c2`

Caused by:
  Permission denied (os error 13)

2025-05-13 06:47:59 [ERROR] `cargo metadata` exited with an error:     Updating git repository `https://github.com/bottlerocket-os/bottlerocket-settings-sdk`
error: failed to get `bottlerocket-defaults-helper` as a dependency of package `settings-defaults-metal-dev v0.3.0 (/tmp/sources/settings-defaults/metal-dev)`

Caused by:
  failed to load source for dependency `bottlerocket-defaults-helper`

Caused by:
  Unable to update https://github.com/bottlerocket-os/bottlerocket-settings-sdk?tag=bottlerocket-defaults-helper-v0.1.1#0cf85735

Caused by:
  failed to create directory `/tmp/.cargo/git/db/bottlerocket-settings-sdk-3ffd24dec00e96c2`

Caused by:
  Permission denied (os error 13)

Error while executing command, exit code: 1
Error: Command was unsuccessful, exit code 105
exit status 1

Now, I don't know why the check licenses flow would mess with the cargo folder... but it does. :D
For now, rather than reverting, I decided to copy the flock implementation from the vendor path to the check licenses task also.

@mikn mikn force-pushed the mk/remove_races branch from f0798ba to 8b2ccbf Compare May 13, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

race condition opportunities in writing tools and verification tags
2 participants