Skip to content

proposal: all: add bare metal support #73608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abarisani opened this issue May 6, 2025 · 45 comments
Open

proposal: all: add bare metal support #73608

abarisani opened this issue May 6, 2025 · 45 comments
Labels
Milestone

Comments

@abarisani
Copy link

abarisani commented May 6, 2025

Proposal Details

I propose the addition of a new GOOS target, such as GOOS=none, to allow Go runtime execution under specific application defined exit functions, rather than arbitrary OS syscalls, enabling freestanding execution without direct OS support.

This is currently implemented in the GOOS=tamago project, but for reasons laid out in the Proposal Background section it is proposed for upstream inclusion.

Go applications built with GOOS=none would run on bare metal, without any underlying OS. All required support is provided by the Go runtime and external driver packages, also written in Go.

Go runtime changes

Note

The changes are also documented in their own repository and pkg.go.dev

A working example of all proposed changes can be found in the GOOS=tamago implementation.

Board support packages or applications would be required (only under GOOS=none) to define the following functions to support the runtime.

If the use of go:linkname is undesirable different strategies are possible, right now linkname is used as convenient way to have externally defined functions being directly invoked in the runtime early on.

  • cpuinit (example): pre-runtime CPU initialization
// cpuinit handles pre-runtime CPU initialization
TEXT cpuinit(SB),NOSPLIT|NOFRAME,$0

The function is required to be defined in assembly as pre-runtime there is lack for support of native Go statements.

  • runtime.hwinit (example): early runtime hardware initialization
// Init takes care of the lower level initialization triggered early in runtime
// setup.
//
//go:linkname Init runtime.hwinit
func Init()
  • runtime.printk (example): standard output (e.g. serial console)
// printk emits a single 8-bit character to standard output
//
//go:linkname printk runtime.printk
func printk(c byte)
  • runtime.initRNG and runtime.getRandomData (examples): random number generation initialization and retrieval
// initRNG initializes random number generation
//
//go:linkname initRNG runtime.initRNG
func initRNG()

// getRandomData generates len(b) random bytes and writes them into b
//
//go:linkname getRandomData runtime.getRandomData
func getRandomData(b []byte) 
  • runtime.nanotime1 (example): system time in nanoseconds
// nanotime1 returns the system time in nanoseconds
//
//go:linkname nanotime1 runtime.nanotime1
func nanotime1() int64
//go:linkname ramStart runtime.ramStart
var ramStart uint

//go:linkname ramSize runtime.ramSize
var ramSize uint

//go:linkname ramStackOffset runtime.ramStackOffset
var ramStackOffset uint

Board support packages or applications can optionally define the following:

  • runtime.Bloc (example): heap memory start address override

  • runtime.Exit (example): runtime termination

  • runtime.Idle (example): CPU idle time management

  • Network I/O through Go own net package requires the application to set an external Socket function (example, driver example):

// SocketFunc must be set externally by the application on GOOS=tamago to
// provide the network socket implementation. The returned interface must match
// the requested socket and be either net.Conn, net.PacketConn or net.Listen.
var SocketFunc func(ctx context.Context, net string, family, sotype int, laddr, raddr Addr) (interface{}, error)

The Go runtime would implement the following, or similar, functions to aid interrupt handling:

  • runtime.GetG (example), runtime.WakeG (example), runtime.Wake (example): asynchronous goroutine wake-up
// GetG returns the pointer to the current G and its P.
func GetG() (gp uint64, pp uint64)

// WakeG modifies a goroutine cached timer for time.Sleep (g.timer) to fire as
// soon as possible.
//
// The function is meant to be invoked within Go assembly and its arguments
// must be passed through registers rather than on the frame pointer, see
// definition in sys_tamago_$GOARCH.s for details.
func WakeG()

// Wake modifies a goroutine cached timer for time.Sleep (g.timer) to fire as
// soon as possible.
func Wake(gp uint)

Compilation

The compilation of such targets would remain identical to standard Go binaries, while the loading strategy might differ depending on the hardware but anyway be handled in a manner completely external to the Go distribution, using standard flags as required, examples:

# Example for Cloud Hypervisory, QEMU and Firecracker KVMs
GOOS=tamago GOARCH=amd64 ${TAMAGO} build -ldflags "-T 0x10010000 -R 0x1000" main.go

# Example for USB armory Mk II
GOOS=tamago GOARM=7 GOARCH=arm ${TAMAGO} build -ldflags "-T 0x80010000 -R 0x1000" main.go

# Example for QEMU RISC-V sifive_u
GOOS=tamago GOARCH=riscv64 ${TAMAGO} build -ldflags "-T 0x80010000 -R 0x1000" main.go

# Example for Linux userspace
GOOS=tamago ${TAMAGO} build main.go

Proposal Background

This proposal follows updates on the TamaGo project, which brings bare metal execution for Go on AMD64, ARM and RISCV64 targets.

While similar proposals (see #37503 and #46802) have been already attempted without success, this last effort is motivated by considerable advancements and changes in our effort.

Notable changes:

  1. There is now fully tested Go standard library support, integrated with vanilla distribution tests. The testing environment for AMD64, ARM and RISCV64 architectures runs under Linux natively or using qemu-user-static via binfmt_misc.

  2. The tamago networking code allows external definition of a single socket function to attach gVisor or any other fake networking stack if desired, this could benefit other Go architectures and allow replacement of existing fake networking for js/wasip1.

  3. Because of what implemented to support 1. and 2. GOOS=tamago allows the execution of unmodified Go applications as softly isolated userspace code with OS resources, such as networking and filesystems, isolated from the actual user OS.

  4. TamaGo now no longer focuses only on ARM embedded systems, but it extends to AMD64 KVM execution such as microVMs.

  5. The overall Go distribution changes are free from hardware dependent code (e.g. peripheral drivers) and changes are unified across different architectures, see for instance the identical implementation for entry points of amd64, arm, riscv64 architectures.

In summary GOOS=tamago has been transformed to a generic implementation that allows execution without relying on operating system calls but rather with a unified "outside world" exit interface, whether implemented for testing, userspace execution, real hardware or paravirtualization.

On ARM we implemented bootloaders, Trusted Execution Environments and full secure OS and applets with this framework.

Thanks to a recent amd64 port, we enabled Pure Go KVMs under Cloud Hypervisor, Firecracker and QEMU.

This also allowed us to implemented execution under UEFI which enabled 100% Go EFI applications and bootloaders such as go-boot (which currently booted Linux on the Thinkpad I am writing from).

We think adopting these compact Go distribution changes would allow not only preservation of this ecosystem but expansion and innovation of the Go language as a whole, taking it to surprising, unxpected, yet ideal, environments.

The cost of maintaining the patch is, in our opinion, reasonable and if anything beneficial in improving the Go abstraction across architectures and OS specific components.

The only component that is somewhat low-level and sensitive in terms of maintenance between major Go releases is our asynchronous goroutine waking function, which is used to serve external interrupt requests.

However such function can be re-implemented in a manner consistent with existing OS signaling or, simply with the awareness and inclusion of GOOS=none, hooked to a much simpler standardized interface within the timer
structure, which so far has not been implemented purely to avoid any pollution with non-tamago architectures.

@gopherbot gopherbot added this to the Proposal milestone May 6, 2025
@gabyhelp
Copy link

gabyhelp commented May 6, 2025

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@gabyhelp gabyhelp added the LanguageProposal Issues describing a requested change to the Go language specification. label May 6, 2025
@seankhliao
Copy link
Member

Looking at the past proposals, this is just a list of new changes (much like the previous), but doesn't actually address the primary concerns of the past proposals, namely the shift of support burden on to the core team.

#35956 (comment)
#37503 (comment)
#46802 (comment)

@abarisani
Copy link
Author

Looking at the past proposals, this is just a list of new changes (much like the previous), but doesn't actually address the primary concerns of the past proposals, namely the shift of support burden on to the core team.

#35956 (comment) #37503 (comment) #46802 (comment)

You are right, I just separated an actual proposal with the underlying background, hopefully this should reason the new request.

I've laid out the external functions proposal which are specifically proposed to shift the support burden of any hardware or target specific support away from the core team.

@FiloSottile
Copy link
Contributor

I would like us to take another look at this. I think there are at least two major changes which affect the maintenance burden vs. value tradeoff evaluation:

  • First, GOOS=none and the OS interaction interface make this much more useful, as it's not just a way to run Go on tiny ARM boards, but lets applications implement support for disparate environments, including hypervisors and pre-OS environments.

  • Second, it's been five years since the first proposal and four years since the last one. @abarisani's team has been maintaining the fork for years now, which lends credibility to their commitment to maintain the port in tree, as well. (In particular, I see they implemented support for running the standard library tests on a regular Linux OS, which is something I had suggested would be necessary to carry a port with a good builder.)

This span of time also makes it possible to have a retrospective look at use cases, and at the actual burden/churn.

I probably over-index on security-sensitive use cases for obvious reasons, but running Go applications sandboxed as Firecracker micro VMs and writing memory-safe UEFI bootloader firmware in Go is extremely exciting to me.

@FiloSottile FiloSottile reopened this May 6, 2025
@FiloSottile FiloSottile removed the LanguageProposal Issues describing a requested change to the Go language specification. label May 6, 2025
@seankhliao
Copy link
Member

What about the other parts of the porting policy: https://go.dev/wiki/PortingPolicy#requirements-for-a-new-port

Besides @abarisani , who are the named maintainers going to be, and what level of user adoption have we seen with tamago over the years?

@prattmic
Copy link
Member

prattmic commented May 6, 2025

cc @golang/runtime

@prattmic
Copy link
Member

prattmic commented May 6, 2025

I am also excited by the idea of bare metal Go.

To me, the major challenges are around compatibility. Presumably such a port should conform to the Go 1 Compatibility Promise. This has two major components:

  1. A stable API for the runtime's dependencies, which will be supported ~indefinitely. Your proposal lays out the current tamago runtime API, and while I agree it needs a better mechanism than linkname, I don't think it is too bad. I suspect we could come up with something we're willing to commit to here. As an aside, I have heard similar requests for a runtime API from folks who would like to run Go on minimal WASM runtimes that don't necessarily support WASI. Presumably this API would work for GOOS=none GOARCH=wasm as well.

  2. A stable subset of the language that can be used to implement the runtime dependency APIs. This is the really scary part to me. "runtime Go" is already a specialized way of writing Go, which is mostly unspecified and unstable. But it is in tree. If a changed compiler heuristic adds an allocation somewhere in the runtime that can't allocate, that's OK, we can change the runtime. With GOOS=none that would be an incompatible change. So I think we'd need to define and support a simple subset of the language that is guaranteed safe to use for implementing these APIs, which does not seem like a small project.

Alternatively, we could punt on this and state that these APIs must not be implemented in Go at all, and instead must be implemented in assembly. That seems unfortunate, though looking at your example, the code is fairly simple so maybe this wouldn't be completely untenable.

@prattmic
Copy link
Member

prattmic commented May 6, 2025

I see Tamago is currently single-threaded (according to the lock implementation in src/runtime/lock_tamago.go). Presumably the virtualized unikernel use case will want multi-threading. Is this something on your radar, and do you have a sense of how it would affect the runtime API?

@abarisani
Copy link
Author

What about the other parts of the porting policy: https://go.dev/wiki/PortingPolicy#requirements-for-a-new-port

Besides @abarisani , who are the named maintainers going to be, and what level of user adoption have we seen with tamago over the years?

We are happy to provide two developers and accept responsibility as well as maintaining a builder, we are happy to provide any support needed and required.

Our existing effort has been engineered specifically to avoid touching first class ports or change them in any way, the implementation of standard distribution tests has also been specifically worked to support such requirements and a builder.

Concerning adoption we have WithSecure sponsored projects as well as Transparency.dev Armory Witness projects (bootloader, OS, applet) as primary adopters of tamago (see a list here). We have several thousands units running tamago on a variety of privately contracted projects that unfortunately we cannot speak publicly about. The amd64 port, which enables non-embedded use cases, however is very recent has it has been published only a few months ago, we do think that a side effect of upstream adoption would also be increase in popularity of Go in places people generally don't see a use for (go-boot being one example).

@abarisani
Copy link
Author

I am also excited by the idea of bare metal Go.

To me, the major challenges are around compatibility. Presumably such a port should conform to the Go 1 Compatibility Promise. This has two major components:

  1. A stable API for the runtime's dependencies, which will be supported ~indefinitely. Your proposal lays out the current tamago runtime API, and while I agree it needs a better mechanism than linkname, I don't think it is too bad. I suspect we could come up with something we're willing to commit to here. As an aside, I have heard similar requests for a runtime API from folks who would like to run Go on minimal WASM runtimes that don't necessarily support WASI. Presumably this API would work for GOOS=none GOARCH=wasm as well.

This is quite right, I think the proposed changes would well serve architectures like wasm or "soft isolated" targets.

  1. A stable subset of the language that can be used to implement the runtime dependency APIs. This is the really scary part to me. "runtime Go" is already a specialized way of writing Go, which is mostly unspecified and unstable. But it is in tree. If a changed compiler heuristic adds an allocation somewhere in the runtime that can't allocate, that's OK, we can change the runtime. With GOOS=none that would be an incompatible change. So I think we'd need to define and support a simple subset of the language that is guaranteed safe to use for implementing these APIs, which does not seem like a small project.>
    Alternatively, we could punt on this and state that these APIs must not be implemented in Go at all, and instead must be implemented in assembly. That seems unfortunate, though looking at your example, the code is fairly simple so maybe this wouldn't be completely untenable.

The initialization code is meant to be really simple, most of it could be actually moved to the application itself but it's part of runtime.hwinit as it provides a nice centralized way and avoids overloading packages init() which is tricky to rely upon due to non-trivial ordering.

It is true that runtime.hwinit needs to be careful about certain hardware specific aspects (such as not printing to a serial console that hasn't been set up yet) but these concerns are application/hardware specific and a burden to BSP implementers and not Go itself, for the rest runtime.hwinit has similar restrictions to runtime.osinit.

The other restriction I can think of is that currently nanotime1() cannot alloc as it can be used very early on while panic'ing.

I think in both cases it could be feasible to require a Go assembly implementation for both though it would mean losing a lot of convenience in just being able to use Go for clean and flexible code.

I do wonder if there is a compile time technique that would allow to scan if an initialization function is suitable or not, in a similar manner to //go:nosplit et al...

@abarisani
Copy link
Author

I see Tamago is currently single-threaded (according to the lock implementation in src/runtime/lock_tamago.go). Presumably the virtualized unikernel use case will want multi-threading. Is this something on your radar, and do you have a sense of how it would affect the runtime API?

I am working on SMP as we speak, it's our intention to add it to the amd64 target at least. I don't think it will, or that should, affect the runtime API, currently we force gomaxprocs to 1 and we panic if newosproc is ever touched, in a manner identical to wasm. I plan to provide a meaningful newosproc implementation for integrating without changing the conventional runtime scheduler hooks.

It's not trivial, but we are actively working on it, I am also toying with the idea of one runtime per core to have AMP rather than SMP as that suits kinda well the bare metal architecture that Go allows, I guess it might also be possible to have newosproc as externally provided, I need to get my bearings on this for the least friction effort but definitely the core principle would be to avoid runtime API changes one way or the other.

@abarisani
Copy link
Author

FYI I added at the bottom of the Go runtime changes section our current interrupt handling helpers, I think there is room for improvement on this front in the proposal and something much simpler could possibly be obtained. The rational of the current waking function was to pierce existing Go runtime structures (namely the timer) without changing anything for other GOOS implementations.

With orchestration of such changes I am sure a much simpler way could be derived, by adding to current timer support, to simply prioritize a waiting/parked goroutine. Nonetheless it's clear that bare metal support would require interrupt handling and that an asynchronous low-level wake up function is required to be invoked in pure (see an example of an handler invoking runtime.WakeG here).

@apparentlymart
Copy link

apparentlymart commented May 7, 2025

Reading this as someone who previously attempted porting Go to a niche OS for fun 😬 it's nice to see what I'm reading as essentially a "minimum viable GOOS porting API" written out in detail, although I understand the Go team's reservations about supporting it as an external compatibility surface.

Do you think it's plausible that what you've explored here could become a general-purpose abstraction layer to aid in new ports to actual operating systems, rather than only to bare-metal?

Specifically I'm thinking of a model where certain GOOS values could be mapped to either in-tree or out-of-tree Go modules that would get automatically linked in by the toolchain whenever that GOOS is selected, and would provide the same supporting machinery that would've been provided by the end-developer in the bare metal situation you described, but packaged up in a reusable form separate from the application.

I imagine that the main ports to mainstream OSes would still benefit from more direct support throughout the runtime and library, but I'm thinking here about more esoteric targets that are OS-like but not sufficiently popular to justify the maintenance costs of a fully-integrated port.

(If what I've asked here seems too far away from what this proposal was intended to cover then I'm happy to cease talking about it to avoid taking the discussion off-topic. My intention in asking is that framing this more broadly as a way to port to niche OSes and to bare metal might make the proposal easier to accept, assuming that's a viable thing to do without adding a huge amount of extra complexity and compatibility-promises.)

@tdunning
Copy link

tdunning commented May 7, 2025

How do you see the overlap with the usage of TinyGo?

@abarisani
Copy link
Author

How do you see the overlap with the usage of TinyGo?

I don't see overlap as TinyGo is a different implementation which also targets entirely different classes of targets, different instruction sets and doesn't provide 100% compatibility with the runtime and/or language, please see our FAQ entry.

@abarisani
Copy link
Author

abarisani commented May 7, 2025

Reading this as someone who previously attempted porting Go to a niche OS for fun 😬 it's nice to see what I'm reading as essentially a "minimum viable GOOS porting API" written out in detail, although I understand the Go team's reservations about supporting it as an external compatibility surface.

Do you think it's plausible that what you've explored here could become a general-purpose abstraction layer to aid in new ports to actual operating systems, rather than to bare-metal?

Specifically I'm thinking of a model where certain GOOS values could be mapped to either in-tree or out-of-tree Go modules that would get automatically linked in by the toolchain whenever that GOOS is selected, and would provide the same supporting machinery that would've been provided by the end-developer in the bare metal situation you described, but packaged up in a reusable form separate from the application.

I imagine that the main ports to mainstream OSes would still benefit from more direct support throughout the runtime and library, but I'm thinking here about more esoteric targets that are OS-like but not sufficiently popular to justify the maintenance costs of a fully-integrated port.

(If what I've asked here seems too far away from what this proposal was intended to cover then I'm happy to cease talking about it to avoid taking the discussion off-topic. My intention in asking is that framing this more broadly as a way to port to niche OSes and to bare metal might make the proposal easier to accept, assuming that's a viable thing to do without adding a huge amount of extra complexity and compatibility-promises.)

Right now porting to an actual OS lacks an API to access the file-system, but I'd say that apart from that it can work to interact with an arbitrary OS for sure, see how we re-use this API to interact with Linux in an isolated fashion.

I think this is a good point to highlight the value of a GOOS=none, it would make it easier to port Go to arbitrary operating systems through this interface.

@abarisani
Copy link
Author

abarisani commented May 7, 2025

Alternatively, we could punt on this and state that these APIs must not be implemented in Go at all, and instead must be implemented in assembly. That seems unfortunate, though looking at your example, the code is fairly simple so maybe this wouldn't be completely untenable.

I think the bulk of runtime.hwinit can be moved after runtime.schedinit safely (resulting in splitting runtime.hwinit to a minimal Go assembly only runtime.hwinit0 and a post boot-strap runtime.hwinit0 executed right before main), which I think would eliminate this concern.

I have a local working hack which achieves this by simply returning 1ns increments before initialization. The only downside I see is that runtime.schedinit will run under "fake" time, but I don't think that matters.

Therefore if ultimately the proposal hinges on this I think we can address it, otherwise I think in general it would be useful to have a go: pragma to allow/enforce pre-schedinit "runtime Go" code.

@prattmic
Copy link
Member

prattmic commented May 7, 2025

We discussed this proposal briefly in #43930 (comment) yesterday, where we had similar thoughts to @apparentlymart's #73608 (comment).

That is, there are two ways of thinking about this proposal:

  1. As a GOOS port, like any other. This was my initial thinking, along with my thought that it seems odd to have a port where the Go 1 Compatibility Promise doesn't apply.

  2. Alternatively, as a "framework" for making it easier to maintain out-of-tree ports. Making it easier to maintain out-of-tree ports was a major discussion point on proposal: all: add bare metal ARM support #46802. In this light, I think it is much more reasonable to say we intend this to be reasonably stable, but don't have a full compatibility promise and still expect port maintainers to keep up with changes in new Go versions.

@eliasnaur
Copy link
Contributor

eliasnaur commented May 7, 2025

How do you see the overlap with the usage of TinyGo?

I don't see overlap as TinyGo is a different implementation which also targets entirely different classes of targets, different instruction sets and doesn't provide 100% compatibility with the runtime and/or language, please see our FAQ entry.

To be clear, your FAQ also mentions embeddedgo, which is a Go port that targets microcontrollers. I use TinyGo today, but would happily switch to "Big" Go and GOOS=none if I could.

I think it would be a mistake to do this proposal without considering the embedded targets. It's true that many (all?) ARM microcontrollers require thumb(v2?) support in the assembler, but I see that as an orthogonal problem (and proposal). I'm not even sure thumb support would require a new GOARCH: as far as I understand, all relevant arm 32-bit processors support thumb. In particular, the Android NDK targets thumbv2 for 32-bit ARM.

With embedded targets, we might see some movement on #6853 :)

I would love to hear what @embeddedgo or @aykevl had to say.

2. Alternatively, as a "framework" for making it easier to maintain out-of-tree ports. Making it easier to maintain out-of-tree ports was a major discussion point on

I have also been thinking of re-proposing GOOS=none for some time and I believe it's more than reasonable to say that the Go compatibility promise will not apply to the runtime interface for GOOS=none.

Looking at the past proposals, this is just a list of new changes (much like the previous), but doesn't actually address the primary concerns of the past proposals, namely the shift of support burden on to the core team.

Please consider the potential decrease in maintenance burden that GOOS=none can bring: "exotic" port or ports without sufficient maintenance can be demoted to an out-of-tree GOOS=none port, thus freeing maintainer time without using the big hammer of removing the port altogether.

@tdunning
Copy link

tdunning commented May 7, 2025

I have been talking to folks at $dayJob and several have been using TamaGo for years and love it. From the user point of view (with no reasonable view into the sausage factory), it would be great to have these capabilities with standard Go.

@backkem
Copy link

backkem commented May 7, 2025

There may also be some overlap with the wasip2 discussion in #65333 as the new WASI Preview 2 "component model" also doesn't have a common ABI, it depends on the component's WIT definition.

@abarisani
Copy link
Author

We discussed this proposal briefly in #43930 (comment) yesterday, where we had similar thoughts to @apparentlymart's #73608 (comment).

That is, there are two ways of thinking about this proposal:

  1. As a GOOS port, like any other. This was my initial thinking, along with my thought that it seems odd to have a port where the Go 1 Compatibility Promise doesn't apply.
  2. Alternatively, as a "framework" for making it easier to maintain out-of-tree ports. Making it easier to maintain out-of-tree ports was a major discussion point on proposal: all: add bare metal ARM support #46802. In this light, I think it is much more reasonable to say we intend this to be reasonably stable, but don't have a full compatibility promise and still expect port maintainers to keep up with changes in new Go versions.

I feels to me that splitting runtime.hwinit pre-World start and post-World start hooks would strike a very good balance and allow both perspectives (and easy debugging hooks to develop new ports).

Something like (yes probably these are terrible names) runtime.hwinit0 (pre World start, Go Assembly only or, in case Go is used, you are on your own and there is no promise) and runtime.hwinit1 (post World start, malloc works, Go promise holds)

The full compatibility promise would be maintained with clear cut hooks with specific role, implementers can still toy with Go hwinit0 but only with Go asm the promise is given, the early hwinit1 allows full Go initialization for anything that doesn't really need to be there ASAP.

While I still need to do more complete testing it seems that on ARM I can get a single liner for hwinit0 (VFP enabling), on AMD64 everything can be moved post World start.

@embeddedgo
Copy link

I just wanted to mention that maintaining a side port of vanilla Go targeted to real ARM MCUs (RAM < 1 MB, Thumb2 ISA) isn't so easy.

The allocator and the GC are surprisingly flexible and can definitely work with such small RAM but require some tweaks.

From my point of view the main problem is the fact that the runtime is unaware of the possibility that the call address may differ from the instruction address and this is a "feature" of the Thumb2 ISA (these addresses differ at LSBit). It gives me a headache every time I merge the next Go release into Embedded Go and I've no enough free time to do it on every PR to the golang/go master branch (BTW, I'm just working on merging go1.24).

I think the first step to support embedded ARM targets it to add support for linux/thumb (Embedded Go has it) and only after that move to the noos/thumb (I personally think the GOOS=none isn't very grep friendly). Eventually the linux/thumb may probably replace linux/arm (AFAIK the Debian linux/armhf uses Thumb2 ISA).

@abarisani
Copy link
Author

I just wanted to mention that maintaining a side port of vanilla Go targeted to real ARM MCUs (RAM < 1 MB, Thumb2 ISA) isn't so easy.

The allocator and the GC are surprisingly flexible and can definitely work with such small RAM but require some tweaks.

From my point of view the main problem is the fact that the runtime is unaware of the possibility that the call address may differ from the instruction address and this is a "feature" of the Thumb2 ISA (these addresses differ at LSBit). It gives me a headache every time I merge the next Go release into Embedded Go and I've no enough free time to do it on every PR to the golang/go master branch (BTW, I'm just working on merging go1.24).

I think the first step to support embedded ARM targets it to add support for linux/thumb (Embedded Go has it) and only after that move to the noos/thumb (I personally think the GOOS=none isn't very grep friendly). Eventually the linux/thumb may probably replace linux/arm (AFAIK the Debian linux/armhf uses Thumb2 ISA).

Yes, I can see how MCUs can be a very different beast, which is why TamaGo targets SoCs or AMD64 which is far easier. I'd say that merging major Go releases has been probably relatively easier on the TamaGo side.

Nonetheless it would be nice to see in the future support for Thumb2 mainstream and the potential interaction with TinyGo in this front.

@eliasnaur
Copy link
Contributor

eliasnaur commented May 8, 2025

Microcontrollers such as the Raspberry Pi rp2350 support risc-v, and Go supports GOARCH=riscv64. That seems to me noos/riscv is a shorter path to an embedded port that avoids the thumb2 dependency.

EDIT: GOARCH=riscv64, not riscv.

@embeddedgo
Copy link

Microcontrollers such as the Raspberry Pi rp2350 support risc-v, and Go supports GOARCH=riscv. That seems to me noos/riscv is a shorter path to an embedded port that avoids the thumb2 dependency.

AFAIK there is currently no support for 32-bit RISCV, aka. GOARCH=riscv in go1.24:

$ GOARCH=riscv go build
go: unsupported GOOS/GOARCH pair linux/riscv

But 32-bit RISCV ISA is very similar to the 64-bit one (what can't be said about ARM and Thumb2) so it should be quiet easy to add it. BTW the Embedded Go supports both RP2350 ARM cores in the latest master-embeedded branch. You can find the supported peripherals here.

@prattmic
Copy link
Member

prattmic commented May 8, 2025

@abarisani #73608 (comment)

To be clear, the safe subset of Go isn't just about "before world started" and "after world started", though that is certainly part of it.

For example, nanotime1 is called in all sorts of precarious situations: by the scheduler when there is no active goroutine or P, by the GC, etc. Thus the implementation of nanotime1 must conform to lots of constraints. Off the top of my head:

  • It must not grow the stack or call into the scheduler (must be recursively //go:nosplit).
  • Must not allocate.
  • Must not write to any pointers pointing to heap objects (must be //go:nowritebarrierrec).

Now, of course nanotime1 is usually going to be trivial so it shouldn't be too hard to conform to these restrictions, but still they are easy to run afoul of.

I think if we view this as supporting out-of-tree ports we don't try to provide language compatibility. It is just too big of a problem. We might want to expose some of the tools we use to make it easier to maintain the runtime. e.g., //go:nowritebarrierrec (currently restricted to std packages). Also the compiler forbids implicit heap allocation in the runtime (it requires explicit new(T)).

@abarisani
Copy link
Author

@abarisani #73608 (comment)

To be clear, the safe subset of Go isn't just about "before world started" and "after world started", though that is certainly part of it.

For example, nanotime1 is called in all sorts of precarious situations: by the scheduler when there is no active goroutine or P, by the GC, etc. Thus the implementation of nanotime1 must conform to lots of constraints. Off the top of my head:

  • It must not grow the stack or call into the scheduler (must be recursively //go:nosplit).
  • Must not allocate.
  • Must not write to any pointers pointing to heap objects (must be //go:nowritebarrierrec).

Now, of course nanotime1 is usually going to be trivial so it shouldn't be too hard to conform to these restrictions, but still they are easy to run afoul of.

I think if we view this as supporting out-of-tree ports we don't try to provide language compatibility. It is just too big of a problem. We might want to expose some of the tools we use to make it easier to maintain the runtime. e.g., //go:nowritebarrierrec (currently restricted to std packages). Also the compiler forbids implicit heap allocation in the runtime (it requires explicit new(T)).

Understood, nanotime1 must be quick and easy in any case, I think these constraints for it not only can be accepted but should be part of any effective implementation of it.

In all our platforms nanotime1 is typically something like:

return read_systimer()*ARM.TimerMultiplier + ARM.TimerOffset

This can safely and quickly be done in assembly without issues.

I think nanotime1 is the only relevant call which must be under the "safe subset of Go" along with runtime.hwinit0 (if we follow my split proposal), possibly printk also follows but that can also be required for assembly implementation (or else you are on your own).

@deadprogram
Copy link

As one of the TinyGo maintainers, I think many of us are interested in seeing something like this happen. I'm on the road right now, so do not have time to get into the details of this specific proposal until probably end of next week.

A couple of the other maintainers like @aykevl and @soypat are on vacation until next week. They will surely be able to add something to the conversation about this specific proposal.

I don't see overlap as TinyGo is a different implementation which also targets entirely different classes of targets, different instruction sets

This is mostly correct. There is some overlap, but that is just mostly incidental. The vast majority of targets for each are very different.

and doesn't provide 100% compatibility with the runtime and/or language.

TinyGo supports the full Go language for several years. It is true that the runtime support can differ.

I will wait until I can look this over in detail before making any specific comment.

Very glad to see movement on this topic, thank you to @abarisani for getting it going (yet again)!

@soypat
Copy link

soypat commented May 10, 2025

Heya peeps, another TinyGo maintainer here. I work on supporting the Raspberry Pi Pico bare-metal target (RP2040/RP2350) and various host peripherals like PIO, DMA, SPI, I2C, PWM, and so on.

I don’t have much to add to the discussion other than expressing my gratitude for the work @abarisani has put into TamaGo and this proposal. I love that it reads like a Rosetta Stone for how the runtime interfaces with hardware-- there’s clearly a lot of experience behind it. It looks solid and feels like an important stepping stone for Go’s future in the embedded space. Huge cheers to everyone who made this happen c:

Really excited about what this could mean for Go going forward!

@rminnich
Copy link
Contributor

I would really like to see this happen. Google already uses Tamago, in transparency.dev. But it would also be useful for u-root, which Google also uses.

@embeddedgo
Copy link

embeddedgo commented May 12, 2025

I've analyzed this proposal more thoroughly and generally agree with it. But in the current form it still lacks support for smaller targets, say Raspberry Pi Pico (RP2350) or STM32H743, which with their 520 KB or 1024 KB of RAM, can be successfully programmed in Go. Below I want to mention four things that come to my mind right now.

  1. Support for execution from ROM/Flash. Current linker assumes the whole binary is loaded to the RAM before execution which isn't possible or practical if you have less than 1 MB of RAM. The required changes aren't to big but don't handle all possible cases. The support for something like a linker script will be desirable.

  2. The way the memory map is defined by this proposal is to modest. To support the small RAM targets the Embedded Go introduces some constants used to modify the original constants in the runtime. As you can see there are too many of them and they are specified directly in the runtime package, so it isn't an ideal solution.

// RP2350 constants

noosScaleDown                   = 8 // must be power of 2
noosStackCacheSize              = 8 * 1024
noosNumStackOrders              = 2
noosHeapAddrBits                = 19 // 512 KiB of SRAM
noosLogHeapArenaBytes           = 15 // 32 KiB
noosArenaBaseOffset             = 0x2000_0000
noosMinPhysPageSize             = 256
noosSpanSetInitSpineCap         = 8
noosStackMin                    = 1024
noosStackSystem                 = 27 * 4 // register stacking at exception entry
noosStackGuard                  = 464
noosFinBlockSize                = 256
noosSweepMinHeapDistance        = 1024
noosDefaultHeapMinimum          = 8 * 1024
noosMemoryLimitHeapGoalHeadroom = 1 << 15
noosGCSweepBlockEntries         = 64
noosGCSweepBufInitSpineCap      = 32
noosGCBitsChunkBytes            = 2 * 1024
noosSemTabSize                  = 31
noosGOGC                        = 30
noosTimeHistMaxBucketBits       = 45

Microcontorllers quite often have more than one RAM region with different (discontinuous) addresses and capabilities (for example DMA capable, non-DMA capable). Support for multiple memory regions isn't essentially required at first but as available memory is always in short supply we cannot ignore this fact in the long term. For example, in Embedded Go the non-DMA capable memory if exists is used by the runtime for persistent allocations leaving more DMA capable RAM for Go programs.

  1. Drivers in Go require efficient support for volatile/MMIO operations from the compiler. It can be fulfilled by set of types and methods in something like mmio package, so there is no need for any language change but the compiler should recognize such package to provide optimized implementation (intrinsics).

  2. There may be required some support from the compiler and linker if you want write interrupt handlers directly in Go. The communication between interrupt handlers and goroutines can be quite efficiently implemented using the netpoller (the latest addition to the Embedded Go).

@abarisani
Copy link
Author

abarisani commented May 12, 2025

My proposal re-uses the existing compiler and memory allocation, therefore yes it is not currently suitable for such targets.

I am happy to ponder the issue however, at first iteration though I found too "large" of a problem for me to address such different variety of platforms, also because of TinyGo existence to be honest.

Fundamentally TamaGo was born with the clear intention of supporting existing compiler targets and without requiring compiler or linker changes, this is why I never considered such an expansion which I thought too invasive for my knowledge.

Concerning interrupt handling this is currently supported through GetG and WakeG runtime functions which I believe I mentioned in the proposal.

@abarisani
Copy link
Author

abarisani commented May 12, 2025

I created a stub package to godoc and gomarkdoc the proposed API, I guess this can also serve to get PRs from @embeddedgo to ponder expanding support to MCUs.

I hope this helps regardless, if it creates confusion happy to remove it. I also split hardware initialization to pre World start and Post World start as it works fine and helps with some of the proposal requirements.

Please see goos-none-proposal or pkg.go.dev.

@tdunning
Copy link

It is clear that TamaGo has its sights set on larger systems than mode embedded microcontrollers, but it is also clear that microcontrollers have a broad range of capabilities which, at the high end, intrudes into the target range of TamaGo.

My experience is with TinyGo and I see and feel a pretty big difference here. The real kicker is often the size of the runtime. Another is the tight integration with the machine environment which typically shows up in the form of a large set of machine definitions in terms of special addresses and volatile declarations. At the small end, one user on the TinyGo slack described getting a trivial application to run 500 bytes of RAM and a number have reported executable sizes in the 10's of kB range. The existence of that sort of target environment makes it clear that having multiple solutions isn't necessarily a bad thing.

As such, I don't think that the fact that TamaGo doesn't cover all target environments isn't so much of a defect as simply a concession to reality. It's still awesome work and a serious contribution.

@embeddedgo
Copy link

Fundamentally TamaGo was born with the clear intention of supporting existing compiler targets and without requiring compiler or linker changes, this is why I never considered such an expansion which I thought too invasive for my knowledge.

As this proposal is titled "add bare metal support" and not "add support for GOOS=tamago" I think its probably a good idea to don't limit this discussion to only one, very specific branch of this subject.

@FiloSottile
Copy link
Contributor

The proposal as I interpreted it is not about discussing the general topic of bare metal and embedded targets, but about adding a specific OS abstraction interface which requires otherwise minimal changes to the runtime and toolchain.

I think that’s the right first step, with its specific cost/benefit ratio. More extensive changes to support additional targets can be discussed separately and can build on top of this, if accepted.

@abarisani
Copy link
Author

Fundamentally TamaGo was born with the clear intention of supporting existing compiler targets and without requiring compiler or linker changes, this is why I never considered such an expansion which I thought too invasive for my knowledge.

As this proposal is titled "add bare metal support" and not "add support for GOOS=tamago" I think its probably a good idea to don't limit this discussion to only one, very specific branch of this subject.

I think having an externally defined sbrk function might help, but I am not sure it will be enough given the complex calculations done in the runtime for heap memory allocation.

I am not sure there is a clean way to abstract those, suggestions are welcome.

I agree with Filippo that probably the proposal title doesn't quite catch the spirit of the proposal.

This third attempt was specifically reignited because the former bare metal focus has now been shifted, working nonetheless to support my former proposals, to a working generic runtime API abstraction (at least for the same class of targets Go currently supports).

@apparentlymart
Copy link

When thinking about the topic of how this could potentially "scale down" to much smaller systems like traditional microcontrollers, maybe this framing is helpful:

  • GOOS=none could be adopted as a general convention across both "mainline" Go, TinyGo, and potentially other implementations to represent the general idea that no operating system is available.

    In particular, this would introduce a standard build tag that library authors could use if they wanted to write libraries that use conditional compilation to exclude certain features that only make sense when an OS is present, while still exporting the subset that can run in the no-OS environment.

    Perhaps over time the standard library would also use this to define a "no-OS subset".

  • Mainline Go could choose to interpret GOOS=none as causing the runtime to build in a mode where it expects various additional symbols to be provided as glue either to direct hardware features or to a lower-level system call interface on an embedded-focused OS that doesn't provide abstractions like file I/O, etc.

    But other implementations that have their own separate runtime implementation would define their own way of treating GOOS=none, which may or may not match the requirements of mainline go. These differences should ideally affect only the main program, and not libraries.

This proposal is currently focused mostly on the second item above, which is inherently specific to the design details of the mainline Go. But the library ecosystem can potentially make use of GOOS=none even in other implementations with different runtime design details, since it's the main program's responsibility to arrange for providing all of the runtime support in a GOOS=none environment.

@clktmr
Copy link

clktmr commented May 12, 2025

Hey there, I'm maintaining the mips/nintendo64 port of @embeddedgo.

Some remarks:

  • Locking primitives should be on this list, i.e. not default to a single-threaded locking mechanism.
  • printk() accepting only a single byte will have either bad performance or require buffering, which can lead to confusion in printf-style debugging. Is there a reason for not accepting a slice?
  • The runtime's API shouldn't be extended by runtime.GetG and runtime.WakeG. The runtime can provide such functions, but they should be linknamed by another package and exported there.
  • However, I would prefer to see something like Wait() and Signal(), like it's solved in embeddedgo.

@tdunning
Copy link

The question about printk() was already answered (quite a ways) upthread.

The idea was that printk() is intended to be useful in panic and, as such, should not allocate because of the risk of a recursive panic. In fact, in the sample runtimes posted in this issue, a single byte was pre-allocated to make sure that no allocations are necessary.

This isn't a general purpose printing substrate. It is a debugging tool of last resort.

@abarisani
Copy link
Author

abarisani commented May 13, 2025

Hey there, I'm maintaining the mips/nintendo64 port of @embeddedgo.

Some remarks:

  • Locking primitives should be on this list, i.e. not default to a single-threaded locking mechanism.
  • printk() accepting only a single byte will have either bad performance or require buffering, which can lead to confusion in printf-style debugging. Is there a reason for not accepting a slice?
  • The runtime's API shouldn't be extended by runtime.GetG and runtime.WakeG. The runtime can provide such functions, but they should be linknamed by another package and exported there.
  • However, I would prefer to see something like Wait() and Signal(), like it's solved in embeddedgo.

Hello and thanks for your feedback.

  • We are working on SMP and that will require an API extension yes.

  • printk takes a single character because all initial console drivers (e.g. UART) take a single byte in their primitive, for a generic support this can be improved but the panic issue mentioned in the previous comment also stands, I will test and update accordingly. Please note that printk is only used in console stdout.

  • I don't think the runtime has a way to asynchronously wake (without polling) from Go assembly and in a safe way a goroutine, a lot of time was spent to find the best way to implement IRQ handling and gopark/wake were not suitable. Happy to learn what you have in mind. As WakeG needs to be invoked by an assembly exception handler, and it must be implemented in assembly, I wasn't sure using linkname was appropriate due to difference in ABI (linkname forces to convert ABIs afaik). If you can point me to your implementation happy to learn if our approach can be improved.

@clktmr
Copy link

clktmr commented May 13, 2025

I don't understand how passing a single byte avoids allocations in printk. At the end of the day, the printk implementation must make sure to avoid allocations? I might be missing something here.

A goroutine waiting on an interrupt is similar to a goroutine waiting on IO. So in embeddedgo a minimal netpoller was implemented. The netpoller will be called by the goroutine scheduler to get a list of ready to run goroutines. In embeddedgo the netpoller might sleep on a futex, but in your case it's even simpler: Just move the g semaphore in pollDesc to pdReady from an interrupt and the goroutine scheduler will wake it for you.

See https://github.com/embeddedgo/go/blob/master-embedded/src/runtime/netpoll_noos.go

@abarisani
Copy link
Author

I don't understand how passing a single byte avoids allocations in printk. At the end of the day, the printk implementation must make sure to avoid allocations? I might be missing something here.

A goroutine waiting on an interrupt is similar to a goroutine waiting on IO. So in embeddedgo a minimal netpoller was implemented. The netpoller will be called by the goroutine scheduler to get a list of ready to run goroutines. In embeddedgo the netpoller might sleep on a futex, but in your case it's even simpler: Just move the g semaphore in pollDesc to pdReady from an interrupt and the goroutine scheduler will wake it for you.

See https://github.com/embeddedgo/go/blob/master-embedded/src/runtime/netpoll_noos.go

We halt the CPU while waiting for interrupts and do not rely on any polling code.

In other words if we are idling waiting for an interrupt nothing is going on and the CPU is halted.

Is this achievable with your code (which I will study as soon as I have the chance)?

Thanks!

@clktmr
Copy link

clktmr commented May 13, 2025

Yes, but you should halt the CPU in your netpoll() implementation and also need wake the CPU after the specified delay, if no interrupts occured. Feel free the message me privately to keep this on topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests