Skip to content

Potential cache issue leading to inconsistent #23869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Grazfather opened this issue May 12, 2025 · 6 comments · May be fixed by #23921
Open

Potential cache issue leading to inconsistent #23869

Grazfather opened this issue May 12, 2025 · 6 comments · May be fixed by #23921
Labels
bug Observed behavior contradicts documented or intended behavior linking
Milestone

Comments

@Grazfather
Copy link

Grazfather commented May 12, 2025

Zig Version

0.14.0

(Built on M1 mac running Sonoma 14.2.1)

Steps to Reproduce and Observed Behavior

While trying to build a image for a microcontroller, with a custom entry point, I noticed that my build was not consistently producing a binary with the same header.

If I have commit A, which works, B, removes where we set the Build.Step.Compile.entry symbol, and then C, which re-adds it, I can sometimes build B and yield a binary with the entrypoint still set according to the entry field.

This behaviour is inconsistent, and sometimes hard (for others) to reproduce.

I have a repro branch in microzig called build_bug.
You can try to repro by cloning the repo and running the following from the examples/raspberrypi/rp2xxxx directory.

#!/bin/bash
set -x
rm -r zig-local/ zig-global/
git switch --quiet --detach build_bug^^ # Good
zig build -Dexample=ram_b --release=small --global-cache-dir "$PWD/zig-global" --cache-dir "$PWD/zig-local"
readelf -h zig-out/firmware/ram_blinky.elf | grep Entry
git switch --quiet --detach build_bug^ # Bad?
zig build -Dexample=ram_b --release=small --global-cache-dir "$PWD/zig-global" --cache-dir "$PWD/zig-local"
readelf -h zig-out/firmware/ram_blinky.elf | grep Entry
git switch --quiet build_bug # Good
zig build -Dexample=ram_b --release=small --global-cache-dir "$PWD/zig-global" --cache-dir "$PWD/zig-local"
readelf -h zig-out/firmware/ram_blinky.elf | grep Entry
git switch --quiet --detach build_bug^ # Bad?
zig build -Dexample=ram_b --release=small --global-cache-dir "$PWD/zig-global" --cache-dir "$PWD/zig-local"
readelf -h zig-out/firmware/ram_blinky.elf | grep Entry

Expected Behavior

Expected output is

Entry point address:               0x20000001
Entry point address:               0x20000645
Entry point address:               0x20000001
Entry point address:               0x20000645

But I sometimes get

Entry point address:               0x20000001
Entry point address:               0x20000001
Entry point address:               0x20000001
Entry point address:               0x20000001

Which suggests that it's not always using the entry field in the Compile step which changes across each commit.

@Grazfather Grazfather added the bug Observed behavior contradicts documented or intended behavior label May 12, 2025
@alexrp
Copy link
Member

alexrp commented May 12, 2025

Do you know if this is a regression in 0.14.0?

@Grazfather
Copy link
Author

Unfortunately I don't, and the repo won't build on 0.13.0 for some time, but I can try to apply my three commits onto an old branch.

@alexrp alexrp added this to the 0.15.0 milestone May 12, 2025
@alexrp alexrp added the linking label May 12, 2025
@mlugg
Copy link
Member

mlugg commented May 12, 2025

I don't think that effort would be hugely valuable; since cache bugs are often to do with filesystem races, it's quite easy for a change to expose an existing cache bug by coincidentally making it more likely. For instance, that happened with #23110; that bug wasn't actually a 0.14.0 regression, but a change I made in that release cycle happened to make it more likely.

It's possible that this is a manifestation of #23110; that depends whether it can be repro'd on master (I'm aware you're currently trying to figure out a more consistent repro before trying that out). But that doesn't seem hugely likely to me.

@Grazfather
Copy link
Author

Grazfather commented May 12, 2025

On master, when I build with a fresh cache, I get the consistent 0x20000001 entry point, which is incorrect for the second and fourth build.

If I immediately build again with the normal cache directory, I 'correctly' get the weird entry point.

❯ zig version
0.15.0-dev.515+833d4c9ce

❯ bash bla
  Entry point address:               0x20000001
  Entry point address:               0x20000001
  Entry point address:               0x20000001
  Entry point address:               0x20000001

❯ zig build -Dexample=ram_b --release=small

❯ readelf.py -h zig-out/firmware/ram_blinky.elf  |grep Entry
  Entry point address:               0x20000635

❯ git l -3 build_bug
* 8cb66652  2025-05-12  Grazfather   (origin/build_bug, rp_ram_image, build_bug) Revert "no entry override"
* 96bd19f7  2025-05-12  Grazfather   (HEAD) no entry override
* af93e8f2  2025-05-12  Grazfather   wip ram image

@castholm
Copy link
Contributor

Is it possible that the underlying problem is as simple as the -fentry family of options not being included in the hash? From a quick glance at Compilation.zig it doesn't look like it hashes the entry point.

I can also reproduce a similar issue with the following on master:

// build.zig
const std = @import("std");

pub fn build(b: *std.Build) void {
    const exe = b.addExecutable(.{
        .name = "repro",
        .root_module = b.createModule(.{
            .target = b.resolveTargetQuery(.{ .cpu_arch = .wasm32, .os_tag = .wasi }),
            .optimize = .ReleaseSmall,
            .root_source_file = b.path("build.zig"),
        }),
    });
    if (b.option([]const u8, "entry", "")) |name| exe.entry = .{ .symbol_name = name };
    b.installArtifact(exe);
}

pub fn main() void {
    std.debug.print("main\n", .{});
}

export fn other() void {
    std.debug.print("other\n", .{});
}

If I run zig build with a clean cache and inspect the Wasm output, I see that it exports _start. If I then run zig build -Dentry=other with the same cache the build completes instantly (suggesting a cache hit) and the Wasm output still exports _start. It's only after I clear the cache I see the other symbol instead.

@mlugg
Copy link
Member

mlugg commented May 14, 2025

Hah, you're right. There are actually also a few other link options which aren't put into the cache manifest. I'll put up a PR fixing all of those soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior linking
Projects
None yet
4 participants