usher-sampled: -T flag not respected by Taskflow executors → ~9000 threads on multi-core hosts

Greetings, @yatisht ! This is Johannes from Gill's lab. usher-sampled is overloading one of our servers (hgwdev) at UCSC unfortunately. See Claude's report below. Hope you're doing well!

## Summary

`usher-sampled`'s `-T`/`--threads` flag only caps Intel TBB's worker pool. The four `tf::Executor` instances introduced with the Taskflow integration are default-constructed, so they each spawn `std::thread::hardware_concurrency()` workers regardless of `-T`. On a 192-core host this produces ~9,000 threads per `usher-sampled` invocation, which dominates the system load average and inflates kernel CPU time.

## Affected lines

All four instantiate `tf::Executor` with no argument, against `master` at commit `9d7ecf7a`:

| File | Line |
|---|---:|
| `src/usher-sampled/sampler.cpp` | 91 |
| `src/usher-sampled/main_mapper.cpp` | 492 |
| `src/usher-sampled/place_sample.cpp` | 703 |
| `src/usher-sampled/place_sample_follower.cpp` | 212 |

From Taskflow's header (`taskflow/core/executor.hpp`):

```cpp
explicit Executor(
    size_t N = std::thread::hardware_concurrency(),
    std::shared_ptr<WorkerInterface> wix = nullptr
);
```

## Observed behavior on a 192-core host (UCSC `hgwdev`)

Two concurrent `usher-sampled` runs invoked with `-T 16`:

```
$ ps -o pid,user,nlwp,pcpu,cmd -C usher-sampled
    PID USER     NLWP %CPU CMD
1254718 angie    9824  2103 .../usher-sampled -T 16 -A -e 5 -t emptyTree.nwk -v ... -o ... --optimization_radius 0 --batch_size_per_process 100
3734113 angie    9679  2285 .../usher-sampled -T 16 -A -e 5 -t emptyTree.nwk -v ... -o ... --optimization_radius 0 --batch_size_per_process 100
```

Per-process thread state breakdown (only ~95 R + ~200 D, ~14k S — most threads are pool workers parked in futex waits, not actively useful):

```
PID 1254718: 49 R, 94 D, 7488 S
PID 3734113: 46 R, 103 D, 6733 S
```

Systemwide load avg ~420 with 192 cores, `%idle` ~50%, `%iowait` 0%, `%sys` ~22% (kernel time approaching user time, indicative of scheduler pressure). Context switches ~2M/s systemwide, ~10–11k per core per second — anomalous for what is supposed to be a CPU-bound tree-placement workload.

## Cause

`-T` is correctly honored by TBB at `src/usher-sampled/driver/main.cpp:506`:

```cpp
tbb::global_control global_limit(tbb::global_control::max_allowed_parallelism, num_threads);
```

…but Taskflow has its own thread pool that is not subject to this limit, and the `Executor` instances are default-constructed.

## Suggested fix

`num_threads` is a global declared in `src/usher-sampled/driver/main.cpp:42`. Pass it to each Executor constructor:

```diff
- tf::Executor executor;
+ tf::Executor executor(num_threads);
```

at all four sites listed above.

A cleaner alternative would be a single global Taskflow executor configured once at startup (mirroring how TBB is configured), but the in-place fix is the minimal change to restore `-T` semantics.

## Repro

Any host where `std::thread::hardware_concurrency()` is much larger than the value passed to `-T`. The discrepancy is visible immediately via:

```bash
ps -o pid,nlwp,cmd -C usher-sampled
```

The `NLWP` column will be roughly `4 × hardware_concurrency()` rather than `~T`.

## Environment

- `usher-sampled` `master` at `9d7ecf7a` (2026-04-30). The four affected sites were introduced in commits `5c32e0aa` ("Adding taskflow"), `782937c2` ("Adding taskflow: place_sample_follower"), and `915325e3` ("Updating files to be consistent with latest usher codebase").
- Linux 5.14, 192 logical CPUs.
- Invocation: `usher-sampled -T 16 -A -e 5 -t emptyTree.nwk -v <vcf.gz> -o <output.pb> --optimization_radius 0 --batch_size_per_process 100`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usher-sampled: -T flag not respected by Taskflow executors → ~9000 threads on multi-core hosts #433

Summary

Affected lines

Observed behavior on a 192-core host (UCSC `hgwdev`)

Cause

Suggested fix

Repro

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File	Line
`src/usher-sampled/sampler.cpp`	91
`src/usher-sampled/main_mapper.cpp`	492
`src/usher-sampled/place_sample.cpp`	703
`src/usher-sampled/place_sample_follower.cpp`	212

usher-sampled: -T flag not respected by Taskflow executors → ~9000 threads on multi-core hosts #433

Description

Summary

Affected lines

Observed behavior on a 192-core host (UCSC hgwdev)

Cause

Suggested fix

Repro

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Observed behavior on a 192-core host (UCSC `hgwdev`)