Skip to content

reconsider a compacting garbage collector #4909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eliasnaur opened this issue May 24, 2025 · 6 comments
Open

reconsider a compacting garbage collector #4909

eliasnaur opened this issue May 24, 2025 · 6 comments
Labels
core enhancement New feature or request

Comments

@eliasnaur
Copy link
Contributor

Extracted from #4889.

@aykevl wrote

In TinyGo, I think it is unlikely we'll ever have a moving garbage collector that would require pinning. Not just because of DMA, but also because it means interrupts can't run while the GC moves the heap.

I've seen this comment before, but I'd like to say I find the current garbage collector too unreliable for dynamic allocation because of memory fragmentation. I frequently run into out-of-memory panics before even half of available memory (512kB) has been filled because of fragmentation. Coupled with the escape analyzer missing key allocations I find myself spending significant effort (and API contortions) to avoid allocations altogether.

I don't see any other way of fixing fragmentation and making the GC reliable than making it moving (e.g. mark-compact GC).

In other words, please reconsider adding a garbage collector that is doesn't fail because of memory fragmentation.

I realize that non-GC runtimes such as the C runtime are also prone to memory fragmentation, which is why embedded programmers shy away from dynamic memory allocation. However, GC languages such as Go are uniquely positioned to eliminate memory fragmentation, thus guaranteeing robustness provided the dynamic heap size fits in available memory.

A practical implementation will need to support pinning of memory used for DMA and interrupts. The recently added runtime.Pinner was added to Big Go for such purposes. Adding Pinner to DMA and interrupt code is a challenge, but I think the trade-off is worth it: there's much more user code than DMA or interrupt code.

@aykevl
Copy link
Member

aykevl commented May 24, 2025

The big issue with a moving GC is that interrupts can't run while the GC is moving memory. Or if they do, it's in a very limited way. Interrupts are kinda really important to run right away, without blocking for a GC.

There might be a few (difficult) ways to make this possible:

  • Interrupts are not allowed to touch heap allocated memory, or even touch pointers to heap allocated memory (since these pointers will be changing). Unfortunately I don't see an easy way of enforcing this in the compiler, and as long as it can't be enforced by the compiler people will be making very subtle mistakes (consider all the places where people allocated memory inside interrupts, until we put a check in place to not allow this. And that's something that is easy to check at runtime).
  • We need some sort of load/store barriers, where every load/store has a check to see whether the GC is currently running and if so works with the GC to do the load/store.
  • All pointer accesses go through a table: every load/store first loads the pointer from the table, and then does the load/store. The GC can then update the table whenever it moves an object.

So in other words, the only options I see to make this work either will make all code very slow (due to load/store barriers or similar overhead) or make interrupts really difficult to write. I don't see a way to avoid this. I've thought long and hard about this problem in the past, I just don't see how this can work. If you have any suggestions, I'm certainly open to new ideas.

There are a few other problems with a moving GC though:

  • It will make interaction with C much, much more difficult. We'd need a separate (non-moving) heap for C, probably.
  • All existing Go code is written in a time when the Go GC isn't moving. I'm fairly confident we'll see a whole load of existing code that relies on this (through various unsafe hacks), so TinyGo will break compatibility there.
  • A moving GC will take a lot of time to write, time we might not have. I probably don't.
  • We'd need to update all existing pointers after an object has moved. This means the GC needs to be fully precise (knowing exactly which values on the stack and in registers are pointers). It is possible, but difficult to do this with LLVM. See https://llvm.org/docs/StackMaps.html for example.
  • Because pointers can now change at any time, they need to be non-integral pointers. This feature exists in LLVM, but is experimental.

Our current GC is also really, really simple. I'm sure there are plenty of ways to make it work better. We currently just allocate objects sequentially until we run out of space and then do a GC cycle. Something simple as grouping allocated objects by size would probably help a lot to reduce fragmentation.

Also, for some embedded systems with enough memory the Boehm GC might be worth using (see #4812 for example).

@eliasnaur
Copy link
Contributor Author

Thank you for taking the time to write down the issues of a moving collector.

The big issue with a moving GC is that interrupts can't run while the GC is moving memory. Or if they do, it's in a very limited way. Interrupts are kinda really important to run right away, without blocking for a GC.

There might be a few (difficult) ways to make this possible:

* Interrupts are not allowed to touch heap allocated memory, or even touch pointers to heap allocated memory (since these pointers will be changing). Unfortunately I don't see an easy way of enforcing this in the compiler, and as long as it can't be enforced by the compiler people will be making very subtle mistakes (consider all the places where people allocated memory inside interrupts, until we put a check in place to not allow this. And that's something that is easy to check at runtime).

I think this is the most viable way.

If interrupts are only allowed to touch non-moving memory, this problem goes away, right? If we then say that memory allocated by interp or init functions is automatically pinned (or allocated in a non-moving heap), most memory touched by interrupts will be non-movable. The exceptions can be manually marked with runtime.Pinner.

That leaves runtime detection, which I agree is essential. Here's an idea:

  1. Disallow dynamic calls from interrupts (interfaces, function pointers).
  2. Compile all code called by interrupts with runtime checks that ensures pointers are pinned.

This rests on the assumptions that normal code and interrupt code overlaps very little and that the call graphs of interrupts contain much less code than normal code. If those hold, I don't think this idea will bloat binaries by much more than the runtime checks themselves.

So in other words, the only options I see to make this work either will make all code very slow (due to load/store barriers or similar overhead) or make interrupts really difficult to write. I don't see a way to avoid this. I've thought long and hard about this problem in the past, I just don't see how this can work. If you have any suggestions, I'm certainly open to new ideas.

There are a few other problems with a moving GC though:

* It will make interaction with C much, much more difficult. We'd need a separate (non-moving) heap for C, probably.

Or simply disallow C code when the non-moving collector is enabled.

* All existing Go code is written in a time when the Go GC isn't moving. I'm fairly confident we'll see a whole load of existing code that relies on this (through various `unsafe` hacks), so TinyGo will break compatibility there.

Absolutely, but then again only when enabling the non-moving collector.

* A moving GC will take a lot of time to write, time we might not have. I probably don't.

Absolutely. This issue is just about reconsidering a non-moving collector, implementation is another story.

* We'd need to update all existing pointers after an object has moved. This means the GC needs to be fully precise (knowing exactly which values on the stack and in registers are pointers). It is possible, but difficult to do this with LLVM. See https://llvm.org/docs/StackMaps.html for example.

That's unfortunate. A robust collector should be fully precise, for the same reason it should be compacting.

Our current GC is also really, really simple. I'm sure there are plenty of ways to make it work better. We currently just allocate objects sequentially until we run out of space and then do a GC cycle. Something simple as grouping allocated objects by size would probably help a lot to reduce fragmentation.

Reducing fragmentation is nice, but IMHO not good enough to fearlessly (to borrow a Rust concept) allocate in long-running TinyGo programs on memory constrained platforms.

Also, for some embedded systems with enough memory the Boehm GC might be worth using (see #4812 for example).

Ditto.

@eliasnaur
Copy link
Contributor Author

We need some sort of load/store barriers, where every load/store has a check to see whether the GC is currently running and if so works with the GC to do the load/store.

Thinking about this some more, I think if even a low-cost runtime check is not viable, a more expensive check is ok. That is, something like the Go race detector that enables slow runtime checks everywhere to detect accesses to unpinned memory from interrupt contexts.

@aykevl
Copy link
Member

aykevl commented May 24, 2025

I'd be open to the addition of a moving GC (if it doesn't interfere too much), but probably won't be working on it myself.

Or simply disallow C code when the non-moving collector is enabled.

We use C for a few important bits of code:

  • The Bluetooth package (for nrf chips).
  • Signals and threads on Linux and MacOS.
  • The WS2812 bitbanging driver.
  • A few bits of inline assembly here and there for the runtime (disabling/restoring interrupts for example).

While some of these could be replaced with Go or assembly versions, it's not always possible or would be very difficult. Which means that we'd probably not be able to enable a moving GC by default. Which means it's much less tested and most people won't see the benefit of having a moving GC.

Again, I'm not opposed to it, I just don't really see it as a viable option for TinyGo. But I could be proven wrong.

That is, something like the Go race detector that enables slow runtime checks everywhere to detect accesses to unpinned memory from interrupt contexts.

Possible, but most people won't be running this I'm afraid.

@eliasnaur
Copy link
Contributor Author

eliasnaur commented May 24, 2025

It will make interaction with C much, much more difficult. We'd need a separate (non-moving) heap for C, probably.

Hang on, why is this the case? In Big Go, pointers passed to C(go) are either automatically pinned or must be pinned by runtime.Pinner. So, C code would be under the same requirements as interrupts (only non-moving pointers may be passed to C). I believe that's what the Big Go -d=checkptr=2 mode checks.

That is, something like the Go race detector that enables slow runtime checks everywhere to detect accesses to unpinned memory from interrupt contexts.

Possible, but most people won't be running this I'm afraid.

Perhaps not, but we'll run it in CI tests. Combined with the race detector and/or checkptr=2 will root out most mistakes in TinyGo code. And most users won't write DMA nor interrupt code.

The situation is similar to the weird crashes that currently happen when a goroutine stack is overflowed. That would also be nice to have a runtime check for, which even if fairly expensive could be suggested to users reporting such crashes, and called out in the FAQ.

@aykevl
Copy link
Member

aykevl commented May 24, 2025

Hang on, why is this the case? In Big Go, pointers passed to C(go) are either automatically pinned or must be pinned by runtime.Pinner. So, C code would be under the same requirements as interrupts (only non-moving pointers may be passed to C). I believe that's what the Big Go -d=checkptr=2 mode does.

Hmm, I think you're right. I was mainly thinking about C malloc and free, but I guess they can be implemented by a Go alloc and pinning the resulting memory.
That said, my understanding is that pinning objects in a moving GC makes things a whole lot more complicated. You'd introduce some necessary fragmentation as a result of pinning some objects, and the GC would need to move the newly moved objects around the pinned objects. It's been a while since I deeply investigated GC algorithms, though.

If you want to read more, I can recommend this book: https://gchandbook.org/
I think I read the 2012 version (many years ago).

@deadprogram deadprogram added enhancement New feature or request core labels May 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants