Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set thread counts for client & server using env variables with better defaults #1993

Closed
wants to merge 3 commits into from

Conversation

pomo-mondreganto
Copy link

No description provided.

@pomo-mondreganto
Copy link
Author

pomo-mondreganto commented Nov 29, 2023

A bit of a backstory: we've encountered two issues using sccache in our cluster.

First, when running in a cpu-limited container environment, tokio erroneously detects the total number of CPUs instead of the container limit, which is correct (this behaviour exists in all schedulers known to me, e.g. the Goland one). Second, it spawns a gigantic tokio scheduler with num_cpus worker threads for each compilation process (client invocation). In our setup (128 logical CPUs, 64 cpu container limit, 64 concurrent compilation jobs) sccache spawned 128 * 64 threads on compilation, which is completely unnecessary and overflows the thread limit for the container.

@glandium
Copy link
Collaborator

First, when running in a cpu-limited container environment, tokio erroneously detects the total number of CPUs instead of the container limit, which is correct (this behaviour exists in all schedulers known to me, e.g. the Goland one).

num_cpus uses sched_getaffinity, and should return the right number. What does your container do to limit CPU?

@pomo-mondreganto
Copy link
Author

First, when running in a cpu-limited container environment, tokio erroneously detects the total number of CPUs instead of the container limit, which is correct (this behaviour exists in all schedulers known to me, e.g. the Goland one).

num_cpus uses sched_getaffinity, and should return the right number. What does your container do to limit CPU?

It's a regular Kubernetes installation with containerd runtime, containers using cpu_request/cpu_limit by k8s.

@pomo-mondreganto
Copy link
Author

I've just checked that in a container with the following settings:

│     Limits:                                                                                                            │
│       cpu:                3                                                                                            │
│       ephemeral-storage:  100Gi                                                                                        │
│       memory:             4Gi                                                                                          │
│     Requests:                                                                                                          │
│       cpu:                3                                                                                            │
│       ephemeral-storage:  100Gi                                                                                        │
│       memory:             4Gi

sched_getaffinity returns all cores:

>>> import os
>>> os.sched_getaffinity(0)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127}

@glandium
Copy link
Collaborator

I guess k8s allocates less scheduling slots rather than less CPUs.

@codecov-commenter
Copy link

codecov-commenter commented Nov 30, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (fb0ab0c) 31.11% compared to head (ba4e92b) 30.89%.

Files Patch % Lines
src/commands.rs 0.00% 1 Missing and 3 partials ⚠️
src/server.rs 0.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1993      +/-   ##
==========================================
- Coverage   31.11%   30.89%   -0.23%     
==========================================
  Files          51       51              
  Lines       19419    19425       +6     
  Branches     9341     9356      +15     
==========================================
- Hits         6043     6001      -42     
- Misses       7785     7797      +12     
- Partials     5591     5627      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pomo-mondreganto
Copy link
Author

Can someone take a look at the "cancelled" test? No output and no reason for cancellation.

@sylvestre
Copy link
Collaborator

Don't bother, it is fine :)

@pomo-mondreganto
Copy link
Author

So can I consider my work finished here? :)

@pomo-mondreganto
Copy link
Author

Any plans on merging this? I might be adding some more features in the near future, and I'd prefer implementing them over upstream, not my fork.

@@ -161,6 +161,7 @@ set(CMAKE_MSVC_DEBUG_INFORMATION_FORMAT Embedded)

And you can build code as usual without any additional flags in the command line, useful for IDEs.

To limit the number of threads sccache process spawns, use `SCCACHE_SERVER_WORKER_THREADS` and `SCCACHE_CLIENT_WORKER_THREADS` environment variables for server and client processes respectively.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain in which contexts someone might need this

@sylvestre
Copy link
Collaborator

it would be nice to add tests which verifies that it works correctly

@sylvestre
Copy link
Collaborator

ping ?

@sylvestre
Copy link
Collaborator

please reopen when ready

@sylvestre sylvestre closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants