RX window timing depends on LoRaMacProcess call timing #1349

matthijskooijman · 2022-08-31T10:53:17Z

Since commit 60fb1b3 the RX window timers are no longer started in the TxDone interrupt handler, but deferred to ProcessRadioTxDone() (called by LoRaMacProcess() from the main loop). This means there is an arbitrary delay (that depends on how often the application calls LoRaMacProcess(), but is necessarily non-zero) between the TxDone interrupt and the timers being started by ProcessRadioTxDone(). However, the timers are started as follows:

LoRaMac-node/src/mac/LoRaMac.c

Lines 823 to 826 in 2fdce5f

    
           TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay ); 
        
           TimerStart( &MacCtx.RxWindowTimer1 ); 
        
           TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay ); 
        
           TimerStart( &MacCtx.RxWindowTimer2 );

So relative to the moment of starting the timer, not relative to the TxDone timing, so the window is always more or less later than intended.

I have seen this happen in practice with the STM32CubeWL stack, which is a port of LoRaMac-node and also contains this same code. Debug output shows that LoRaMacProcess() is called about 20ms after TxDone, and the RX window is late by the same amount of time. I have not verified exact timing with a scope (nor tested on this upstream code - I'm not sure if I have any hardware available for that), but the problem seems clear enough from looking at the code to report it already now.

This is probably easy to fix: Just set the timer relative to TxDoneParams.CurTime (which is set by the TxDone ISR already) instead of relative to now. Since the timer API only seems to support setting timers relative to now (there is no way to pass an absolute time), I guess this means something like:

 uint32_t offset = TimerGetCurrentTime() - TxDoneParams.CurTime;
 TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset ); 
 TimerStart( &MacCtx.RxWindowTimer1 ); 
 TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset); 
 TimerStart( &MacCtx.RxWindowTimer2 );

I think the above code should also be ran atomically (All ISRs disabled) to ensure no significant time passes between the above call to TimerGetCurrentTime() and the moment the timer code makes the timestamp absolute:

LoRaMac-node/src/system/timer.c

Lines 126 to 127 in 2fdce5f

    
           elapsedTime = RtcGetTimerElapsedTime( ); 
        
           obj->Timestamp += elapsedTime;

This atomicity could be prevented if the timer code gained an interface to start a timer for an absolute instead of relative time.

The text was updated successfully, but these errors were encountered:

For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349

descartes · 2022-09-20T18:41:24Z

Would you like me to send you something based on STM32WL?

matthijskooijman · 2022-09-20T19:03:13Z

@descartes, I'm not sure I follow your offer? I already have code from the STM32CubeWL package working and applied the fix suggested above, which indeed seems to work to fix the problem (I haven't solved atomicity though). I've also confirmed the problem actually exists, by inserting a 100ms delay before each call to LoRaMacProcess()/

descartes · 2022-09-21T09:16:48Z

I was referring to any hardware you need - boards with STM32WL yes, new oscilloscope not so much.

matthijskooijman · 2022-09-21T09:41:11Z

Ah, I have 2 Nucleo STM32WL55JC boards, so I'm good :-)

What would be helpful, I think, is for someone to reproduce this issue (and try the fix) with the original LoRaMac-node code (instead of the STM32CubeWL code which is modified by ST). I'm lacking hardware, a test setup and time for that...

mluis1 · 2022-09-21T11:41:01Z

@matthijskooijman Thanks for the finding and fix proposition.

Following your recommendation I have applied below patch

diff --git a/src/mac/LoRaMac.c b/src/mac/LoRaMac.c
index edb172cc..3a102afe 100644
--- a/src/mac/LoRaMac.c
+++ b/src/mac/LoRaMac.c
@@ -810,11 +810,15 @@ static void ProcessRadioTxDone( void )
     {
         Radio.Sleep( );
     }
+
     // Setup timers
-    TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay );
+    CRITICAL_SECTION_BEGIN( );
+    uint32_t offset = TimerGetCurrentTime( ) - TxDoneParams.CurTime;
+    TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );
     TimerStart( &MacCtx.RxWindowTimer1 );
-    TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay );
+    TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset );
     TimerStart( &MacCtx.RxWindowTimer2 );
+    CRITICAL_SECTION_END( );
 
     if( MacCtx.NodeAckRequested == true )
     {

I have run a quick test using a NucleoL476+SX1276MB1LAS platform.
To perform the quick test I have used the LCTT tool (LoRa-Alliance certication tool) and run the test verifying RXTimingSetupReq MAC command.
The test ran successfully. Please refer to rx_timing_setup_req_test_log.zip for the test tool report.

We will push the fix as soon as we have finalized the review process and run further tests.

matthijskooijman · 2022-09-22T09:15:32Z

Cool, that matches the fix I applied to the STM32CubeWL code as well (except I omitted the critical section for now).

Did you also confirm the problem by adding some artificial delay? Easiest is probably to just add a delay inside LoRaMacProcess at the start, something like 100ms (must not be too small, since there is already a fixed 20ms margin in the RX timing, and at higher SF's the margin is even more, so I mainly tested at SF7).

About the critical section - I haven't measured, but my intuition is that this produces a fairly long criticial section that would add problematic latency for some kinds of interrupts. I would suggest, as an additional improvement, considering to revise the Timer API adding something like TimerStartAbs() or so where you pass an absolute timestamp (and ignore the value passed to TimerSetValue). If this would be available, using it could remove the need for the critical section completely AFAICS. I'm not sure how the timer util API is maintained and developed, maybe changing that is a more complicated process, but it looks like this would be a change in timer.c/h only, without needing changes to the underlying Rtc*() API.

mluis1 · 2022-09-22T14:40:00Z

Please find in following zip the result for some Rx windows tests. A 100 ms delay has been added just before the CRITICAL_SECTION_BEGIN call. The performed tests verify all supported data rates.
rx_window_test_log_with_100_ms_delay.zip

The time spent inside the critical section will depend on the number of timers that have been added to the linked list. At a given time I think that the number of timers in the list will be small. However, this will depend deeply on the application being built.

For the moment it should be quite safe to keep the critical section.

Adding a new API allowing to start a timer with absolute time is indeed a good idea for this kind of operation.
As we are getting closer to execute v4.7.0 release we will postpone such API addition to a future release.
In order to not forget to add this API we will create an issue under our internal Git server as a reminder.

matthijskooijman · 2022-09-22T14:46:25Z

For the moment it should be quite safe to keep the critical section.

Adding a new API allowing to start a timer with absolute time is indeed a good idea for this kind of operation.
As we are getting closer to execute v4.7.0 release we will postpone such API addition to a future release.

Agreed, and sounds good. Thanks!

For the RX window timing, a timer is started after TxDone. However, since this is done in the mainloop rather than the TxDone ISR, the exact timing of the RX window depends on how soon after the interrupt LoRaMacProcess() is called (but it will always be at least a little too late). The TxDone interrupt already captures a timestamp, so this commit uses that to correctly position the RX window. There is still a small race condition where interrupts could still cause a delay, but the influence of that should be small and rare enough in practice, so this is left unfixed here (a proper fix would require changes to the timer subsystem, which are a bit too invasive to make here). This fixes Lora-net/LoRaMac-node#1349

slavendam · 2024-02-01T16:24:54Z

@matthijskooijman
Did you have issues similar to this: stm32duino/STM32LoRaWAN#38
Can it be that some value overflows when IRQ handler/timer is set? Or maybe something connected with RTC timer? I did lot of debuging, but don't have an idea what can be the source of issue...

matthijskooijman · 2024-02-01T16:45:36Z

@slavendam Hm, you make a good point. When ProcessRadioTxDone() (through LoRaMacProcess(), or modem.maintain() in the ST32Duino library) is called very late (i.e. >= 1s after the interrupt), offset could end up being more than MacCtx.RxWindow1Delay, resulting in an underflow in this line:

 TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );

Which would lead to effectively indefinitely postponing the RX1 handler.

@slavendam Could it be that your sketch is indeed not calling modem.maintain() for a long time?

You could confirm that this is the case, by printing the value of MacCtx.RxWindow1Delay and offset here.

@mluis1 I wonder if it would be worthwhile to catch and handle this case (by cancelling the timers and wrapping up the transaction, I guess? Could be done after the critical section)? Even though it is not really useful to support applications that call LoRaMacProcess() so seldomly, if an application accidentally delays calling it on a rare occasion, it would be good if the stack did not freeze up...

matthijskooijman · 2024-02-01T19:43:43Z

From the followup in the other issue, it turns out that the issue @slavendam is seeing is not caused by offset underflowing. I do still think the underflow issue exists, but it is probably obscure enough to just ignore it. I'll leave that here for consideration.

mluis1 added this to the Release Version 4.7.0 milestone Sep 21, 2022

mluis1 added a commit that referenced this issue Sep 22, 2022

Issue #1349 - Fixed Rx windows timer handling

e5076e1

mluis1 closed this as completed Sep 22, 2022

NovaNekmit mentioned this issue Mar 15, 2023

Integer underflow while calculating RXWindow #1522

Open

matthijskooijman mentioned this issue Feb 1, 2024

RX1 window Timer event never reached stm32duino/STM32LoRaWAN#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RX window timing depends on LoRaMacProcess call timing #1349

RX window timing depends on LoRaMacProcess call timing #1349

matthijskooijman commented Aug 31, 2022

descartes commented Sep 20, 2022

matthijskooijman commented Sep 20, 2022

descartes commented Sep 21, 2022

matthijskooijman commented Sep 21, 2022

mluis1 commented Sep 21, 2022

matthijskooijman commented Sep 22, 2022

mluis1 commented Sep 22, 2022

matthijskooijman commented Sep 22, 2022

slavendam commented Feb 1, 2024

matthijskooijman commented Feb 1, 2024

matthijskooijman commented Feb 1, 2024

RX window timing depends on LoRaMacProcess call timing #1349

RX window timing depends on LoRaMacProcess call timing #1349

Comments

matthijskooijman commented Aug 31, 2022

descartes commented Sep 20, 2022

matthijskooijman commented Sep 20, 2022

descartes commented Sep 21, 2022

matthijskooijman commented Sep 21, 2022

mluis1 commented Sep 21, 2022

matthijskooijman commented Sep 22, 2022

mluis1 commented Sep 22, 2022

matthijskooijman commented Sep 22, 2022

slavendam commented Feb 1, 2024

matthijskooijman commented Feb 1, 2024

matthijskooijman commented Feb 1, 2024