create a separate executable for each fuzz test; change std.testing fuzzing API to unify corpus with unit tests

Currently, all unit tests of a given compilation are compiled into the same binary, and the test runner iterates over all of them and runs them one by one. This saves time and disk space, eliminating the overhead of building an independent executable for each test.

The same is true for fuzz tests. When rebuilding a compilation in fuzz mode, all fuzz tests (and all other non-fuzzing unit tests) are compiled into the same executable. While this saves time and disk space at first, these benefits come at the cost of a larger points-of-interest array. This array is the set of virtual memory addresses for which the fuzzer is trying to reach. By keeping this array smaller, each individual fuzz run can be slightly faster. Multiplied by many fuzz runs, the tradeoff tilts in favor of separate executable for each fuzz test.

To further optimize fuzz run performance, we also want to minimize overhead of calling into the function. For instance ideally the hot fuzz loop will be a direct call into the code being tested, rather than dereferencing a function pointer. This is constrained by the `std.testing` API for how to declare something as a fuzz test.

This issue changes from the current API:

```zig
pub inline fn fuzz(context: anytype, comptime testOne: fn (context: @TypeOf(context), input: []const u8) anyerror!void, options: FuzzInputOptions) anyerror!void
```

Which is used like this:

```zig
test "example fuzz test" {
    try std.testing.fuzz({}, testOne, .{});
}
```

Into this signature:

```zig
pub fn fuzz(comptime testOne: *const fn (input: []const u8) anyerror!void, input: []const u8) anyerror!void
```

Note that the context is gone. Usage is now like this:

```zig
test "example unit test" {
    try std.testing.fuzz(myFuzzTest, "example input");
}
```

Importantly, "example unit test" is *no longer itself a fuzz test*. Instead, it has a *side-effect* of informing the build system about the existence of a fuzz test, as well as one input for the corpus. So, the following unit tests may also exist:

```zig
test "another unit test" {
    try std.testing.fuzz(myFuzzTest, "same fuzz test, different input");
    try std.testing.fuzz(myFuzzTest, "yet a third input to the same fuzz test");
}

test "third unit test" {
    try std.testing.fuzz(myFuzzTest, "this fourth input still does not declare a unique fuzz test");
    //try std.testing.fuzz(foobar, "ok now this one creates a new fuzz test"); // not allowed for reasons explained below
}
```

So this fundamentally changes the algorithm for build system discovering fuzz tests - they no longer correspond one-on-one to a unit test. This handles the common case that unit test coverage acts as a good initial corpus for fuzzing. For instance, consider this one-liner added to parser_test.zig:

```diff
--- a/lib/std/zig/parser_test.zig
+++ b/lib/std/zig/parser_test.zig
@@ -6415,6 +6415,7 @@ fn testParse(source: [:0]const u8, allocator: mem.Allocator, anything_changed: *
     return formatted;
 }
 fn testTransformImpl(allocator: mem.Allocator, fba: *std.heap.FixedBufferAllocator, source: [:0]const u8, expected_source: []const u8) !void {
+    std.testing.fuzz(fuzzTestOneParse, source);
     // reset the fixed buffer allocator each run so that it can be re-used for each
     // iteration of the failing index
     fba.reset();
```

This means that all the unit test cases from the rest of the file will be collected into the initial corpus.

Finally, circling back to optimizing the fuzz function - since this API declares a comptime-known function pointer as a fuzz test, it means that when recompiling a test binary in fuzz mode, specifically that one function that was referenced can be wrapped in the test runner like this:

```zig
export fn zig_fuzzer_one(input_ptr: [*]const u8, input_len: usize) void {
    theOneAndOnlyFuzzFunction(input_ptr[0..input_len]) catch |err| { ... };
}
```

And of course this will be inlined and optimized so that it has no overhead.

In order to accomplish this without complicated and brittle machinery inside the compiler, `std.testing.fuzz` will note the first unit test corresponding to any particular fuzz function pointer. A second, different fuzz function may not be declared in the same unit test. This means that fuzz functions can be identified by unit test index. When recompiling unit tests for purpose of exposing a single fuzz function, the unit test index can be used for conditional compilation to eliminate most dead code, and then the `std.testing.fuzz` function will simply `@export` the wrapper. This only works if unit tests are forbidden from declaring more than one fuzz test (not to be confused with more than one corpus input).

This restriction could be lifted by adding a more advanced fuzz test declaration function which includes a comptime string which makes the exported function unique. This id then becomes part of the tuple that identifies a fuzz test (unit test index, comptime string fuzz test id), and then is used as conditional compilation to avoid compiling multiple fuzz tests declared in the same unit test into one executable.

By doing things this way, unit tests and fuzz tests are unified so that the unit tests can test additional things such as expected result, while also declaring the initial corpus data for property-based testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

create a separate executable for each fuzz test; change std.testing fuzzing API to unify corpus with unit tests #25352

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

create a separate executable for each fuzz test; change std.testing fuzzing API to unify corpus with unit tests #25352

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions