Skip to content

fix(mesh): avoid tx delay build up with setTransmitDelay#9780

Open
m1nl wants to merge 3 commits intomeshtastic:developfrom
m1nl:avoid_tx_delay_build_up
Open

fix(mesh): avoid tx delay build up with setTransmitDelay#9780
m1nl wants to merge 3 commits intomeshtastic:developfrom
m1nl:avoid_tx_delay_build_up

Conversation

@m1nl
Copy link
Contributor

@m1nl m1nl commented Feb 28, 2026

setTransmitDelay increases the delay of the first packet in the TX queue. In the current implementation, setTransmitDelay is called whenever a TX or RX interrupt is received, or when canSendImmediately() returns false.

When tx_after is set, each call to setTransmitDelay increases it further by add_delay. In busy environments, where interrupts occur frequently, this causes tx_after to grow rapidly, resulting in excessive transmission delays.

My PR prevents tx_after from being increased on every call to setTransmitDelay. It is also worth noting that existing logic prevents packets from being sent immediately, since max will select now + add_delay if it is greater than tx_after:

p->tx_after = min(max((unsigned long)p->tx_after, now + add_delay), now + 2 * getTxDelayMsecWeightedWorst(p->rx_snr));
  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other (please specify below)

setTransmitDelay increases delay for the very first packet in tx
queue. Current implementation calls setTransmitDelay whenever
TX or RX interrupt is received or canSendImmediately is false.
In setTransmitDelay when tx_after is set, it will be always
increased by add_delay which makes the value build up in busy
environments very quickly.
@github-actions github-actions bot added the bugfix Pull request that fixes bugs label Feb 28, 2026
@thebentern thebentern requested review from GUVWAF and Copilot February 28, 2026 17:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts setTransmitDelay() to prevent transmission delay (tx_after) from compounding on frequent interrupts, avoiding excessive TX backoff in busy environments.

Changes:

  • Updates tx_after computation to avoid adding add_delay repeatedly when tx_after is already set.

@GUVWAF
Copy link
Member

GUVWAF commented Feb 28, 2026

Note that tx_after is only first set by clampToLateRebroadcastWindow(), which is used only for ROUTER_LATE and CLIENT_BASE roles. Please see the original discussion about this, starting roughly here: #5528 (comment)

It was mostly done this way in order for ROUTER_LATE not to transmit before a normal ROUTER/CLIENT does.

Besides, when p->tx_after + add_delay is getting large, it will still take the minimum of that and now + 2 * getTxDelayMsecWeightedWorst(p->rx_snr), so it will not grow indefinitely.

@m1nl
Copy link
Contributor Author

m1nl commented Feb 28, 2026

Note that tx_after is only first set by clampToLateRebroadcastWindow(), which is used only for ROUTER_LATE and CLIENT_BASE roles. Please see the original discussion about this, starting roughly here: #5528 (comment)

It was mostly done this way in order for ROUTER_LATE not to transmit before a normal ROUTER/CLIENT does.

Besides, when p->tx_after + add_delay is getting large, it will still take the minimum of that and now + 2 * getTxDelayMsecWeightedWorst(p->rx_snr), so it will not grow indefinitely.

Thanks for feedback! At the moment when tx_after is initially set, it guarantees that ROUTER_LATE won't transmit earlier than ROUTER/CLIENT for this particular packet, so adding additional delay to it every time setTransmitDelay is called doesn't make much sense to me. Please also notice that now is variable in now + 2 * getTxDelayMsecWeightedWorst(p->rx_snr), so in reality, tx_delay may grow for a very long time in busy environments as it'll be pushed later over and over.

@GUVWAF
Copy link
Member

GUVWAF commented Feb 28, 2026

it guarantees that ROUTER_LATE won't transmit earlier than ROUTER/CLIENT for this particular packet

This is not the case. When you receive/transmit a packet, the ROUTER/CLIENT in your range likely does so too and also calls setTransmitDelay() again. Hence, at the next opportunity for both (when the channel is idle), if you don't increase tx_after again, tx_after might be before now and then the ROUTER/CLIENT does not get priority.

I'm also curious how @erayd thinks of this.

@m1nl
Copy link
Contributor Author

m1nl commented Feb 28, 2026

When you receive/transmit a packet, the ROUTER/CLIENT in your range likely does so too and also calls setTransmitDelay() again. Hence, at the next opportunity for both (when the channel is idle), if you don't increase tx_after again, tx_after might be before now and then the ROUTER/CLIENT does not get priority.

I think the other argument in max (now + add_delay) ensures that tx_after is always increased (set to now + add_delay at minimum) and ROUTER_LATE behaves the same as ROUTER/CLIENT in the worst case. That is, after tx_after elapses, ROUTER_LATE will begin handling the packet like a regular (non-delayed) packet, effectively matching ROUTER behavior for that packet, which is consistent with its role.

p->tx_after = min(max((unsigned long)p->tx_after, now + add_delay), now + 2 * getTxDelayMsecWeightedWorst(p->rx_snr));

In general, I think my change can be summarized as follows:

Once the initial tx_after is reached, ROUTER_LATE starts processing the packet as a normal, non-delayed packet, while still ensuring it never transmits earlier than ROUTER.

@m1nl
Copy link
Contributor Author

m1nl commented Mar 2, 2026

I performed some testing to show what is the case I'm referring to. Please take a look at traces below.

tx_delay = tx_after - now (ms)
rx_diff = rx_time - time (s)

When a packet with tx_after is at the front of the queue, calling setTransmitDelay increases tx_after even if it is already set to a very high value (e.g., several seconds). This is unnecessary and causes the packet to remain in the queue longer than intended.

TX queue status - packet position=0 tx_delay=0 rx_diff=0 id=3267979337
TX queue status - packet position=4 tx_delay=3264 rx_diff=2 id=3267979337  <- MOVED TO LATE WINDOW
TX queue status - packet position=2 tx_delay=-10256 rx_diff=15 id=3267979337
TX queue status - packet position=1 tx_delay=-10647 rx_diff=16 id=3267979337
TX queue status - packet position=1 tx_delay=6403 rx_diff=22 id=3267979337
TX queue status - packet position=0 tx_delay=6086 rx_diff=22 id=3267979337 <- HERE
TX queue status - packet position=1 tx_delay=6393 rx_diff=23 id=3267979337 <- INCREASE
TX queue status - packet position=0 tx_delay=5337 rx_diff=24 id=3267979337
TX queue status - packet position=1 tx_delay=3154 rx_diff=32 id=3267979337
TX queue status - packet position=1 tx_delay=2725 rx_diff=32 id=3267979337
TX queue status - packet position=2 tx_delay=-3070 rx_diff=38 id=3267979337
TX queue status - packet position=1 tx_delay=-4288 rx_diff=39 id=3267979337
TX queue status - packet position=2 tx_delay=-8671 rx_diff=43 id=3267979337
TX queue status - packet position=1 tx_delay=-8950 rx_diff=44 id=3267979337
TX queue status - packet position=0 tx_delay=1728 rx_diff=1 id=4055790825
TX queue status - packet position=1 tx_delay=1926 rx_diff=7 id=4055790825
TX queue status - packet position=0 tx_delay=629 rx_diff=9 id=4055790825
TX queue status - packet position=1 tx_delay=1700 rx_diff=13 id=4055790825
TX queue status - packet position=0 tx_delay=634 rx_diff=14 id=4055790825
TX queue status - packet position=1 tx_delay=794 rx_diff=17 id=4055790825
TX queue status - packet position=1 tx_delay=587 rx_diff=18 id=4055790825
TX queue status - packet position=1 tx_delay=713 rx_diff=23 id=4055790825
TX queue status - packet position=0 tx_delay=458 rx_diff=24 id=4055790825
TX queue status - packet position=1 tx_delay=2943 rx_diff=28 id=4055790825
TX queue status - packet position=0 tx_delay=2681 rx_diff=28 id=4055790825 <- HERE
TX queue status - packet position=1 tx_delay=3172 rx_diff=30 id=4055790825 <- INCREASE
TX queue status - packet position=1 tx_delay=2295 rx_diff=31 id=4055790825
TX queue status - packet position=1 tx_delay=974 rx_diff=38 id=4055790825
TX queue status - packet position=0 tx_delay=433 rx_diff=39 id=4055790825
TX queue status - packet position=1 tx_delay=2616 rx_diff=40 id=4055790825
TX queue status - packet position=2 tx_delay=2176 rx_diff=40 id=4055790825
TX queue status - packet position=2 tx_delay=1932 rx_diff=41 id=4055790825
TX queue status - packet position=2 tx_delay=986 rx_diff=42 id=4055790825
TX queue status - packet position=3 tx_delay=-770 rx_diff=43 id=4055790825
TX queue status - packet position=2 tx_delay=-1499 rx_diff=44 id=4055790825
TX queue status - packet position=2 tx_delay=-10384 rx_diff=53 id=4055790825
TX queue status - packet position=1 tx_delay=-11133 rx_diff=54 id=4055790825
TX queue status - packet position=1 tx_delay=3098 rx_diff=60 id=4055790825
TX queue status - packet position=0 tx_delay=2656 rx_diff=60 id=4055790825

This also breaks ordering and makes the packet with much higher tx_after be in front of packets, which are heavily late.

TX queue status - packet position=0 tx_delay=5592 rx_diff=41 id=4228545667
TX queue status - packet position=1 tx_delay=960 rx_diff=1 id=151245228
TX queue status - packet position=2 tx_delay=-34362 rx_diff=109 id=2523763692
TX queue status - packet position=3 tx_delay=-102245 rx_diff=103 id=3817850352

Actually we could consider sorting the queue, whenever tx_after gets updated, but this is not in scope of this PR and IMO it's not needed as the queue head will be just adjusted slightly and capped to now + add_delay if initial tx_after passes.

@GUVWAF
Copy link
Member

GUVWAF commented Mar 3, 2026

I do see what you mean and while I still think the original way was the most "correct" in terms of the intended behaviour of ROUTER_LATE, I'm now inclined indeed to go with your approach, as the situation you describe above seems worse than sometimes having the ROUTER_LATE rebroadcast too early.

Still I would like to hear what @erayd thinks of this.

@erayd
Copy link
Contributor

erayd commented Mar 3, 2026

I missed your first tag sorry @GUVWAF - am flat out right now, but should get to this later today. It's supposed to be clamped to something sane, so if the time is blowing out massively I'd consider that a bug (possibly interrupt / notification related, if it's firing the increase before tx_after is reached). I'll have a good look though the logic again, as it's been a while since I touched this part of the code.

@m1nl What was ChUtil on that node doing during your logged tests? If it's getting stuck with delayed packets in the queue, that may mean utilisation is getting too high and it's simply struggling to find some space to transmit where other nodes aren't already using the frequency. "Somebody else is talking" should result in it adding a small additional delay (in any mode, not just ROUTER_LATE) and then trying again.

Not really related to this PR I think, but a better structure for the TX queue would IMO be a good idea. I think we're trying to make the current one do too many things at once, and the data structure used isn't particularly well suited to it any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants