Skip to content

sys\malloc_thread_safe: thread safe malloc isn't realy thread-safe #21282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DeJusten opened this issue Mar 8, 2025 · 0 comments
Open

sys\malloc_thread_safe: thread safe malloc isn't realy thread-safe #21282

DeJusten opened this issue Mar 8, 2025 · 0 comments

Comments

@DeJusten
Copy link
Contributor

DeJusten commented Mar 8, 2025

Description

The malloc() functions in the malloc_thread_safe module are secured using the linker command: '--wrap malloc' (see sys\malloc_thread_safe\Makefile.include). This ensures that the linker replaces all 'malloc' symbols with '__wrap_malloc()' (the latter symbol can be found in sys\malloc_thread_safe\malloc_wrappers.c). In addition, the linker replaces all '__real_malloc' symbols with the 'malloc' symbol (so that __wrap_malloc() can call the actual malloc function).

The linker manual does not talk about symbols but about 'undefined reference'. In my opinion, this excludes, for example, the malloc calls in fully linked libraries (such as libc.a libc_nano.a). Malloc calls there are not 'wrapped'. This can be proven by setting a breakpoint on _malloc_r() and __wrap_malloc() and then calling printf() or puts() for the first time (which calls malloc() the first time). Program execution is now stopped for the first time at _malloc_r().

Various functions in newlib use the malloc() functions. For example, puts() reserves a buffer memory the first time it is called, provided that this has not been previously configured using setvbuf() or _fstat_r() (see *1)).
Further examples:

  • File/directory functions
  • atexit()
  • Regular search: regcomp()
  • tsearch() hsearch() ...
  • Conversion functions: iconv() strftime()

The functionality activated in the ARM GCC compiler for providing thread-local storage 'Emulated TLS' also uses malloc. However, these malloc calls are wrapped (see *2))

Solution to the problem

newlib calls __malloc_lock() and __malloc_unlock() at appropriate points to protect the malloc functions thread-safely. These are defined with weak, so that they can easily be overloaded:

mutex_t malloc_mutex;
void __malloc_lock (struct _reent *ptr)
{
(void)ptr;
mutex_lock(&malloc_mutex);
}
void __malloc_unlock (struct _reent *ptr)
{
(void)ptr;
mutex_unlock(&malloc_mutex);
}

However, this only secures the heap. In fact, there are further 'resources' within newlib that newlib also makes thread safe:

  • Access to stdin, stdout, stderr
  • Access to atexit data structure
  • Access to environment variables
  • Access to time functions
  • Access to directory functions
  • Access to random number generator

Thread safe protection within newlib is achieved via the __retarget_xxx() functions and the provision of mutexes and recursive mutexes. This protection is already included by default in the ESP32 variant cpu/esp32/syscall.c and has been optimized here with regard to memory space reservation of the mutex (not yet fully tested):

#include <stdlib.h>      //fuer calloc()
#include <stdio.h>       //fuer __FILE *stdin, stdout, stderr
#include <sys/lock.h>    //fuer _LOCK_T
//#include <sys/reent.h> //struct _reent; __FILE 

#include "rmutex.h"

struct __lock {
	rmutex_t _lock;
};

struct __lock stdio_lock[3];

struct __lock __lock___sfp_recursive_mutex; /* For additional locking of stdio function against thread cancellation*/
struct __lock __lock___atexit_recursive_mutex; /* For atexit() */
struct __lock __lock___at_quick_exit_mutex; /* For atexit() alternative */
struct __lock __lock___malloc_recursive_mutex; /* For malloc()/realloc()/...*/
struct __lock __lock___env_recursive_mutex;  /*For Environment Variable */
struct __lock __lock___tz_mutex;  /* //For Environment Variable */
struct __lock __lock___dd_hash_mutex;  /* For Directory functionality */
struct __lock __lock___arc4random_mutex; /* For Randomnumbergenerator */
//__File {.. struct __lock *_lock; ..};  /* For every opened File*/

//Aufgrund dessen, dass bei _LOCK_T nicht zwischen Rekursive und nicht Rekursive
//unterschieden wird und somit Rekursive Lock genutzt werden muss, hier die 'Einheitsfunktionen'!
static inline void _lock_init(_LOCK_T *lock)
{
	//assert(lock != NULL);
	//Für stdin/stdout/stderr wird mit __sinit der Lock initialisiert
	//Zwecks Vermeidung von malloc für diese 3 Dateien feste Adresse vergeben
	if(lock == &stdin->_lock)
		*lock=&stdio_lock[0];
	else if(lock == &stdout->_lock)
		*lock=&stdio_lock[1];
	else if(lock == &stderr->_lock)
		*lock=&stdio_lock[2];
	else {
		struct __lock *__lock=(struct __lock *)calloc(1,sizeof(struct __lock));
		assert(__lock == NULL);
		*lock=__lock;
	}
}

static inline void _lock_close(_LOCK_T lock)
{
	//assert(lock != NULL);
	free(lock);
}

static inline void _lock_acquire(_LOCK_T lock)
{
    assert(lock != NULL);   /* lock must not be NULL */
    assert(!irq_is_in());   /* _lock_acquire must not be called in
                               interrupt context */
	
    /* if scheduler is not running, we have not to lock the mutex */
   /* __sinit() ruft __sfp_lock_acquire() auf und hier ist der scheduler noch nicht am laufen */
    if (thread_getpid() == KERNEL_PID_UNDEF) {
        return;
    }

    rmutex_lock(&lock->_lock);
}

static inline void _lock_release(_LOCK_T lock)
{
    assert(lock != NULL);   /* lock must not be NULL */
    assert(!irq_is_in());   /* _lock_acquire must not be called in
                               interrupt context */
    /* if scheduler is not running, we have not to unlock the mutex */
   /* __sinit() ruft __sfp_lock_release() auf und hier ist der scheduler noch nicht am laufen */
    if (thread_getpid() == KERNEL_PID_UNDEF) {
        return;
    }

    rmutex_unlock(&lock->_lock);
}

//__lock___at_quick_exit_mutex / __lock___tz_mutex /
// __lock___dd_hash_mutex      / __lock___arc4random_mutex
void __retarget_lock_init(_LOCK_T *lock)
{
	_lock_init(lock);
}

//__lock___sfp_recursive_mutex    / __lock___atexit_recursive_mutex
//__lock___malloc_recursive_mutex / __lock___env_recursive_mutex
//__File {.. struct __lock *_lock; ..};
void __retarget_lock_init_recursive(_LOCK_T *lock)
{
	_lock_init(lock);
}

void __retarget_lock_close(_LOCK_T lock)
{
	_lock_close(lock);
}
void __retarget_lock_close_recursive(_LOCK_T lock)
{
	_lock_close(lock);
}

void __retarget_lock_acquire (_LOCK_T lock)
{
	_lock_acquire(lock);
}

void __retarget_lock_acquire_recursive (_LOCK_T lock)
{
	_lock_acquire(lock);
}

int __retarget_lock_try_acquire(_LOCK_T lock)
{
	(void) lock;
	assert(1);
	return -1; //???
}

int __retarget_lock_try_acquire_recursive(_LOCK_T lock)
{
	(void) lock;
	assert(1);
	return -1; //???
}

void __retarget_lock_release (_LOCK_T lock)
{
	_lock_release(lock);
}

void __retarget_lock_release_recursive (_LOCK_T lock)
{
	_lock_release(lock);
}

In my opinion, the 'struct _reent' data structure is used to secure processes in which each process is offered its own memory and, among other things, its own file system. An adjustment of this data structure is therefore not necessary/useful (although when using PicoLib, this is set with a thread change)

The disadvantage of the proposed solutions is that malloc_thread_safe() is not only used to represent thread safety, but also to call the functions of malloc_monitor and for malloc_tracing. These features would be lost with the change.

See also #4488

Appendix

*1) Configuration of the output buffer of puts

//__attribute__(constructor) ensures that premain_init() is called immediately after __sinit()
//(both within __libc_init_array())
void __attribute__((constructor)) premain_init(void)
{
#if 0
//No linebuffering, call stdio_write() immediately
//-> Slow, because __sflush_r()/_write()/stdio_write() is called with each character
setvbuf(stdout,NULL,_IONBF,0);
#else
static char linebuf[16];
//LineBuffering into global variable (good compromise)
setvbuf(stdout,linebuf,_IOLBF,sizeof(linebuf));
#endif
}

Alternatively
in the function 'int _fstat_r(struct _reent *r, int fd, struct stat *st)'
set the mode to character device or the block size to match the FIFO size of the UART buffer
st.st_mode=S_IFCHR //To indicate that it is a character device
st.st_blksize=xxx //so that the default size is not used as the block size (see __smakebuf_r() in newlib)

*2) Thread Local Storage
With Thread Local Storage, it would be worth considering whether to deactivate it using a macro:

#define __thread //For Gcc
#define _Thread_local //From C11
#define thread_local //From C++11

Currently, Emulated TLS reserves memory when a variable is accessed for the first time, but this is not released when the task ends! Emulated TLS is also very slow (see [https://github.com/gcc-irror/gcc/blob/master/libgcc/emutls.c].)
Alternatively, you could consider installing newlib as a package and configuring it appropriately so that instead of Emulated TLS, the variant via .tdata .tbss section is used (both are present in the linker script without being used)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant