Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]ulTaskGetRunTimeCount(NULL) returns incorrect result #1202

Open
aohzju opened this issue Dec 3, 2024 · 15 comments
Open

[BUG]ulTaskGetRunTimeCount(NULL) returns incorrect result #1202

aohzju opened this issue Dec 3, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@aohzju
Copy link

aohzju commented Dec 3, 2024

Describe the bug
If the function ulTaskGetRunTimeCounter is called for the current running task (i.e., with the parameter being NULL), the returned time counter value will not include the time since the task was switched in because the counter pxTCB->ulRunTimeCounter is only updated on context switch. If the task has high priority and has been running for long time, the reported time will have remarkable error.
The functions ulTaskGetRunTimePercent and vTaskGetInfo also have this issue.

Target
Any

Host
Any

To Reproduce
By reviewing the code

Expected behavior
The function ulTaskGetRunTimeCounter shall update the running time for the calling task before returning the result.

https://forums.freertos.org/t/issue-with-ultaskgetruntimecounter/21994/3

@aohzju aohzju added the bug Something isn't working label Dec 3, 2024
@aggarg
Copy link
Member

aggarg commented Dec 3, 2024

As explained on the forum post, we update the run time stats on every context switch and it is by design. The case you describe is a corner case where a high priority task runs for long periods and in-between queries the runtime stats - if this long running high priority task requires its own runtime to be accurate, it can call yield before querying the stats. Do you have a usecase for this?

@aohzju
Copy link
Author

aohzju commented Dec 4, 2024

Yes, we are running FreeRTOS SMP. We have some logic inside task code depending on how long (effectively, not wall clock time) an internal state has been staying:

//...
if(state_changed && state == eSomeState)
task_time_cnt1 = ulTaskGetRunTimeCount(NULL);
//...
if(state == eSomeState && ulTaskGetRunTimeCount(NULL) - task_time_cnt1 > 1000)
DoSomething();

Actually we have another case, which is about Watchdog function implementation. I'm planning to have every task to update its own health flag inside its main loop. If the task is running well, it should be able to update its health flag every N milliseconds. But now I guess this might be problematic too as ulTaskGetRumTimeCount(someTaskHandle) (called in the watchdog task) may not return correct time if the task "someTaskHandle" has run for long periods without context switch. We have multiple CPUs and run FreeRTOS SMP, some tasks on certain CPUs do run long time without context switch.

Any suggestions? Much appreciated.

@aggarg
Copy link
Member

aggarg commented Dec 4, 2024

You want to calculate the total time a task has been running which is the sum of the following two:

  1. Time that the task ran till it was last switched out. It can be obtained using ulTaskGetRumTimeCount.
  2. Total time that task has been running since it was last switched in. We need to perform some extra calculations to get this. The following is an example how this can be done:
/* This goes in FreeRTOSConfig.h. */
extern void TaskSwitchedIn( void );
#define traceTASK_SWITCHED_IN() TaskSwitchedIn()

/* This goes in application source file. */
configRUN_TIME_COUNTER_TYPE ulTaskSwitchedInTime[ configNUMBER_OF_CORES ] = { 0U };

void TaskSwitchedIn( void )
{
    ulTaskSwitchedInTime[ portGET_CORE_ID() ] = portGET_RUN_TIME_COUNTER_VALUE();
}

void HighPrioCpuBoundTask( void * param )
{
    for( ;; )
    {
        /* A task can calculate its runtime like the following. Critical section
         * is used to ensure that we are not switched out in the middle of the
         * calculation. */
        taskENTER_CRITICAL();
        {
            configRUN_TIME_COUNTER_TYPE timeSinceLastSwitchedIn, timeTillLastSwitchedOut, totalRunTime;

            timeSinceLastSwitchedIn = portGET_RUN_TIME_COUNTER_VALUE() - ulTaskSwitchedInTime[ portGET_CORE_ID() ];
            timeTillLastSwitchedOut = ulTaskGetRunTimeCount( NULL );

            totalRunTime = timeSinceLastSwitchedIn + timeTillLastSwitchedOut;
        }
        taskEXIT_CRITICAL();
    }
}

Note that the kernel maintains ulTaskSwitchedInTime which is not exposed to the application but can we do that if the above solves your problem. You'll not need to maintain ulTaskSwitchedInTime in the application using traceTASK_SWITCHED_IN.

@aohzju
Copy link
Author

aohzju commented Dec 4, 2024

Yes, your approach works. Thank you very much.
I think I need to do the similar workaround for ulTaskGetRunTimeCount(tskHdl), where tskHdl is not NULL.
Isn't it more graceful if it can be fixed from the kernel itself? After all, the function name ulTaskGetRunTimeCount reads like the total run time, not the time until last switched in.
My system has a CLI, from which I can enter a command to print out the idle time percentage of each CPU like below:

for (int i = 0; i < NUM_OF_C0_CORES; i++) {

        TaskHandle_t idle = xTaskGetIdleTaskHandle(i);

        configRUN_TIME_COUNTER_TYPE perc = ulTaskGetRunTimePercent(idle);

        printf("Core %d idle: %ld, %ld%%\n", i, ulTaskGetRunTimeCounter(idle), perc);

    }

If my system is actually idle, I know for sure all CPUs are running the idle tasks, but the output idle percentage print goes lower and lower with time. (my configUSE_TIME_SLICING is 0), while it should approach 99%.

@aggarg
Copy link
Member

aggarg commented Dec 5, 2024

If you add the following 2 definitions in tasks.c, does that work for you:

#if ( configGENERATE_RUN_TIME_STATS == 1 )

    configRUN_TIME_COUNTER_TYPE ulTaskGetRunTimeCounter2( const TaskHandle_t xTask )
    {
        TCB_t * pxTCB;
        configRUN_TIME_COUNTER_TYPE ulCurrentRunTimeCounter, ulTimeSinceLastSwitchedIn, ulTotalRunTime;

        traceENTER_ulTaskGetRunTimeCounter( xTask );

        pxTCB = prvGetTCBFromHandle( xTask );
        configASSERT( pxTCB != NULL );

        taskENTER_CRITICAL();
        {
            #ifdef portALT_GET_RUN_TIME_COUNTER_VALUE
                portALT_GET_RUN_TIME_COUNTER_VALUE( ulCurrentRunTimeCounter );
            #else
                ulCurrentRunTimeCounter = portGET_RUN_TIME_COUNTER_VALUE();
            #endif

            ulTimeSinceLastSwitchedIn = ulCurrentRunTimeCounter - ulTaskSwitchedInTime[ portGET_CORE_ID() ];
            ulTotalRunTime = pxTCB->ulRunTimeCounter + ulTimeSinceLastSwitchedIn;
        }
        taskEXIT_CRITICAL();

        traceRETURN_ulTaskGetRunTimeCounter( ulTotalRunTime );

        return ulTotalRunTime;
    }

#endif /* if ( configGENERATE_RUN_TIME_STATS == 1 ) */
/*-----------------------------------------------------------*/

#if ( configGENERATE_RUN_TIME_STATS == 1 )

    configRUN_TIME_COUNTER_TYPE ulTaskGetRunTimePercent2( const TaskHandle_t xTask )
    {
        TCB_t * pxTCB;
        configRUN_TIME_COUNTER_TYPE ulTotalTime, ulReturn;

        traceENTER_ulTaskGetRunTimePercent( xTask );

        ulTotalTime = ( configRUN_TIME_COUNTER_TYPE ) portGET_RUN_TIME_COUNTER_VALUE();

        /* For percentage calculations. */
        ulTotalTime /= ( configRUN_TIME_COUNTER_TYPE ) 100;

        /* Avoid divide by zero errors. */
        if( ulTotalTime > ( configRUN_TIME_COUNTER_TYPE ) 0 )
        {
            ulReturn = ulTaskGetRunTimePercent( xTask ) / ulTotalTime;
        }
        else
        {
            ulReturn = 0;
        }

        traceRETURN_ulTaskGetRunTimePercent( ulReturn );

        return ulReturn;
    }

#endif /* if ( configGENERATE_RUN_TIME_STATS == 1 ) */
/*-----------------------------------------------------------*/

@aohzju
Copy link
Author

aohzju commented Dec 5, 2024

Thanks, Gaurav. I added a fix similar to your code in my tasks.c. Shall I use vTaskSuspenAll instead of taskENTER_CRITICAL, as my system is running freertos SMP version?

@aggarg
Copy link
Member

aggarg commented Dec 5, 2024

Shall I use vTaskSuspenAll instead of taskENTER_CRITICAL, as my system is running freertos SMP version?

Both would work but since we are performing a small deterministic calculation, using critical section would be more performant.

Also, instead of modifying tasks.c you can use FREERTOS_MODULE_TEST:

  1. Define FREERTOS_MODULE_TEST in your FreeRTOSConfig.h.
  2. Create a file named tasks_test_access_functions.h and put the above code in that file.

This way you won't need to modify FreeRTOS code. Meanwhile, I will discuss internally if we should incorporate this change.

@aohzju
Copy link
Author

aohzju commented Dec 5, 2024

This is how I fixed the issue:

    configRUN_TIME_COUNTER_TYPE ulTaskGetRunTimeCounter( const TaskHandle_t xTask )
    {
        TCB_t * pxTCB;
        configRUN_TIME_COUNTER_TYPE  ulTotalRunTime;

        traceENTER_ulTaskGetRunTimeCounter( xTask );

        pxTCB = prvGetTCBFromHandle( xTask );
        configASSERT( pxTCB != NULL );

        vTaskSuspendAll();  //Fixed issue: https://github.com/FreeRTOS/FreeRTOS-Kernel/issues/1202
        ulTotalRunTime = pxTCB->ulRunTimeCounter;
        if(taskTASK_IS_RUNNING(pxTCB)) 
            ulTotalRunTime += portGET_RUN_TIME_COUNTER_VALUE() - ulTaskSwitchedInTime[pxTCB->xTaskRunState];
        
        xTaskResumeAll();
        
        traceRETURN_ulTaskGetRunTimeCounter( pxTCB->ulRunTimeCounter );

        //return pxTCB->ulRunTimeCounter;
        return ulTotalRunTime;
    }

I feel vTaskSuspendAll must be used because the API ulTaskGetRunTimeCounter may be called to query the run time of tasks running on other CPUs. taskENTER_CRITICAL cannot prevent context switch from happening on other CPUs.

@aggarg
Copy link
Member

aggarg commented Dec 5, 2024

taskENTER_CRITICAL cannot prevent context switch from happening on other CPUs.

A context switch happening on another core will not invalidate the calculation done in the critical section. Or did I miss something? Thanks for sharing your solution - nice and compact!

@aohzju
Copy link
Author

aohzju commented Dec 5, 2024

Yes, the task whose run time is being queried cannot change run state, see the case below:

        ulTotalRunTime = pxTCB->ulRunTimeCounter; //assuming pxTCB is running at this point
        if(taskTASK_IS_RUNNING(pxTCB))   //But it's switched out up to this line. Then I still get inaccurate total run time.
            ulTotalRunTime += portGET_RUN_TIME_COUNTER_VALUE() - ulTaskSwitchedInTime[pxTCB->xTaskRunState];

@aggarg
Copy link
Member

aggarg commented Dec 6, 2024

//But it's switched out up to this line. Then I still get inaccurate total run time.

This cannot happen if you're in a critical section. Right?

@aohzju
Copy link
Author

aohzju commented Dec 6, 2024

It can happen because the queried task is running on a core other than the core calling ulTaskGetRunTimeCounter. You agree that the critical section does not prevent context switch on another core, right?

@aggarg
Copy link
Member

aggarg commented Dec 9, 2024

You agree that the critical section does not prevent context switch on another core, right?

No, that cannot happen. A critical section acquires both spin locks and a context switch also acquires the same - this ensures a context switch cannot happen on any core when a core is in critical section.

@aohzju
Copy link
Author

aohzju commented Dec 15, 2024

You agree that the critical section does not prevent context switch on another core, right?

No, that cannot happen. A critical section acquires both spin locks and a context switch also acquires the same - this ensures a context switch cannot happen on any core when a core is in critical section.

Ok. Thanks for explaining this. That makes sense. I thought tasks running on other cores can still context-switch as long as they don't try to enter the critical section. Does #1154 affect the behavior of taskENTER_CRITICAL?

@aggarg
Copy link
Member

aggarg commented Dec 16, 2024

Does #1154 affect the behavior of taskENTER_CRITICAL?

No, it would not. That PR only optimizes performance and does not change the functional behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants