-
Notifications
You must be signed in to change notification settings - Fork 1.1k
RX window timing depends on LoRaMacProcess call timing #1349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
Would you like me to send you something based on STM32WL? |
@descartes, I'm not sure I follow your offer? I already have code from the STM32CubeWL package working and applied the fix suggested above, which indeed seems to work to fix the problem (I haven't solved atomicity though). I've also confirmed the problem actually exists, by inserting a 100ms delay before each call to |
I was referring to any hardware you need - boards with STM32WL yes, new oscilloscope not so much. |
Ah, I have 2 Nucleo STM32WL55JC boards, so I'm good :-) What would be helpful, I think, is for someone to reproduce this issue (and try the fix) with the original LoRaMac-node code (instead of the STM32CubeWL code which is modified by ST). I'm lacking hardware, a test setup and time for that... |
@matthijskooijman Thanks for the finding and fix proposition. Following your recommendation I have applied below patch diff --git a/src/mac/LoRaMac.c b/src/mac/LoRaMac.c
index edb172cc..3a102afe 100644
--- a/src/mac/LoRaMac.c
+++ b/src/mac/LoRaMac.c
@@ -810,11 +810,15 @@ static void ProcessRadioTxDone( void )
{
Radio.Sleep( );
}
+
// Setup timers
- TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay );
+ CRITICAL_SECTION_BEGIN( );
+ uint32_t offset = TimerGetCurrentTime( ) - TxDoneParams.CurTime;
+ TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );
TimerStart( &MacCtx.RxWindowTimer1 );
- TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay );
+ TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset );
TimerStart( &MacCtx.RxWindowTimer2 );
+ CRITICAL_SECTION_END( );
if( MacCtx.NodeAckRequested == true )
{ I have run a quick test using a NucleoL476+SX1276MB1LAS platform. We will push the fix as soon as we have finalized the review process and run further tests. |
Cool, that matches the fix I applied to the STM32CubeWL code as well (except I omitted the critical section for now). Did you also confirm the problem by adding some artificial delay? Easiest is probably to just add a delay inside LoRaMacProcess at the start, something like 100ms (must not be too small, since there is already a fixed 20ms margin in the RX timing, and at higher SF's the margin is even more, so I mainly tested at SF7). About the critical section - I haven't measured, but my intuition is that this produces a fairly long criticial section that would add problematic latency for some kinds of interrupts. I would suggest, as an additional improvement, considering to revise the Timer API adding something like |
Please find in following zip the result for some Rx windows tests. A 100 ms delay has been added just before the The time spent inside the critical section will depend on the number of timers that have been added to the linked list. At a given time I think that the number of timers in the list will be small. However, this will depend deeply on the application being built. For the moment it should be quite safe to keep the critical section. Adding a new API allowing to start a timer with absolute time is indeed a good idea for this kind of operation. |
Agreed, and sounds good. Thanks! |
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349
@matthijskooijman |
@slavendam Hm, you make a good point. When
Which would lead to effectively indefinitely postponing the RX1 handler. @slavendam Could it be that your sketch is indeed not calling You could confirm that this is the case, by printing the value of @mluis1 I wonder if it would be worthwhile to catch and handle this case (by cancelling the timers and wrapping up the transaction, I guess? Could be done after the critical section)? Even though it is not really useful to support applications that call |
From the followup in the other issue, it turns out that the issue @slavendam is seeing is not caused by |
Since commit 60fb1b3 the RX window timers are no longer started in the TxDone interrupt handler, but deferred to
ProcessRadioTxDone()
(called byLoRaMacProcess()
from the main loop). This means there is an arbitrary delay (that depends on how often the application callsLoRaMacProcess()
, but is necessarily non-zero) between the TxDone interrupt and the timers being started byProcessRadioTxDone()
. However, the timers are started as follows:LoRaMac-node/src/mac/LoRaMac.c
Lines 823 to 826 in 2fdce5f
So relative to the moment of starting the timer, not relative to the TxDone timing, so the window is always more or less later than intended.
I have seen this happen in practice with the STM32CubeWL stack, which is a port of LoRaMac-node and also contains this same code. Debug output shows that
LoRaMacProcess()
is called about 20ms after TxDone, and the RX window is late by the same amount of time. I have not verified exact timing with a scope (nor tested on this upstream code - I'm not sure if I have any hardware available for that), but the problem seems clear enough from looking at the code to report it already now.This is probably easy to fix: Just set the timer relative to
TxDoneParams.CurTime
(which is set by the TxDone ISR already) instead of relative to now. Since the timer API only seems to support setting timers relative to now (there is no way to pass an absolute time), I guess this means something like:I think the above code should also be ran atomically (All ISRs disabled) to ensure no significant time passes between the above call to
TimerGetCurrentTime()
and the moment the timer code makes the timestamp absolute:LoRaMac-node/src/system/timer.c
Lines 126 to 127 in 2fdce5f
This atomicity could be prevented if the timer code gained an interface to start a timer for an absolute instead of relative time.
The text was updated successfully, but these errors were encountered: