feat(extensions): add unsigned integer extension types (u8, u16, u32, u64) by kadinrabo · Pull Request #953 · substrait-io/substrait

kadinrabo · 2026-01-29T18:12:01Z

Description

Adds unsigned integer types (u8, u16, u32, u64) as first-class extension types with arithmetic function support and test coverage.

Self-contained unsigned_integers.yaml with type definitions (string structure encoding) and arithmetic function overloads (add, subtract, multiply, divide, modulus, sum, min, max)
functions_arithmetic.yaml is untouched
Test cases in tests/cases/arithmetic_unsigned/, following the arithmetic_decimal convention
Generic udtArg grammar rule for parsing UDT literals in test cases
Test framework updated to scan all extension YAML files for function definitions

Closes #944 and follows up community agreement from Substrait Meeting Notes on 28 Jan 2026 that type variations are not appropriate for unsigned integers due to differing semantics.

This change is

… u64)

grammar/FuncTestCaseParser.g4

benbellick

Thanks for working on this 🙂

Left a few more comments

extensions/unsigned_integers.yaml

benbellick · 2026-02-27T15:22:55Z

extensions/unsigned_integers.yaml

+  -
+    name: "divide"
+    description: >
+      Divide x by y. Partial values are truncated (rounded towards 0).


what happens if y is zero?

We should include the divide description from

substrait/extensions/functions_arithmetic.yaml

Lines 177 to 186 in 2705258

-

name: "divide"

description: >

Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0).

The `on_division_by_zero` option governs behavior in cases where y is 0. If the option is IEEE then

the IEEE754 standard is followed: all values except +/-infinity return NaN and +/-infinity are unchanged.

If the option is LIMIT then the result is +/-infinity in all cases.

If either x or y are NaN then behavior will be governed by `on_domain_error`.

If x and y are both +/-infinity, behavior will be governed by `on_domain_error`.

impls:

here.

I can go with something like this, thoughts?

Divide x by y. Partial values are truncated (i.e. rounded towards 0). The `on_division_by_zero` option governs behavior in cases where y is 0. If either x or y are out of range, behavior will be governed by `on_domain_error`.

extensions/unsigned_integers.yaml

benbellick · 2026-02-27T15:35:21Z

tests/cases/arithmetic_unsigned/add.test

Is it possible to have null cases added to these tests? Might be overkill but can't hurt.

I'd prefer to keep parity with signed tests here, which don't have null cases for add/subtract/multiply either. Open to adding them if you feel strongly, but might be better as a follow up

I'm okay with not doing this for now.

vbarua

This looks good overall to me.

The one thing I would like to see is that for the test files, let's map over not just the basic tests but also the overflow, null handling, etc tests that are applicable to usigned integers. I think the floating exception tests are the only ones that don't make apply.

extensions/unsigned_integers.yaml

vbarua · 2026-02-27T21:13:52Z

extensions/unsigned_integers.yaml

+  -
+    name: "divide"
+    description: >
+      Divide x by y. Partial values are truncated (rounded towards 0).


We should include the divide description from

substrait/extensions/functions_arithmetic.yaml

Lines 177 to 186 in 2705258

-

name: "divide"

description: >

Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0).

The `on_division_by_zero` option governs behavior in cases where y is 0. If the option is IEEE then

the IEEE754 standard is followed: all values except +/-infinity return NaN and +/-infinity are unchanged.

If the option is LIMIT then the result is +/-infinity in all cases.

If either x or y are NaN then behavior will be governed by `on_domain_error`.

If x and y are both +/-infinity, behavior will be governed by `on_domain_error`.

impls:

here.

extensions/unsigned_integers.yaml

tests/cases/arithmetic_unsigned/sum.test

tests/cases/arithmetic_unsigned/add.test

vbarua

Left one comment on the definition of divide, and I think we should pull modulus out of this entirely because the existing definition could use a once over, but otherwise looks good to me.

The core set of functions you've added, along with the tests files, set a good example for how to add both future functions for unsigned integers, and also for adding new types.

vbarua · 2026-03-05T00:12:27Z

extensions/unsigned_integers.yaml

+          division_type:
+            values: [ TRUNCATE, FLOOR ]
+          overflow:
+            values: [ SILENT, SATURATE, ERROR ]


Can... modulus even overflow?

vbarua · 2026-03-05T00:13:10Z

extensions/unsigned_integers.yaml

+            value: u!u8
+        options:
+          division_type:
+            values: [ TRUNCATE, FLOOR ]


Is division_type an option we need here. The signed modulus operator defines 2 tests for them:

substrait/tests/cases/arithmetic/modulus.test

Lines 18 to 20 in e4ce3f8

# division_type: Examples demonstrating truncate and floor division types

modulus(8::i8, -3::i8) [division_type:TRUNCATE] = 2::i8

modulus(8::i8, -3::i8) [division_type:FLOOR] = -1::i8

. I'm not sure this is applicable for unsigned ints.

vbarua · 2026-03-05T00:17:04Z

tests/cases/arithmetic_unsigned/add.test

I'm okay with not doing this for now.

vbarua · 2026-03-05T00:19:12Z

extensions/unsigned_integers.yaml

+            value: u!u8
+        options:
+          overflow:
+            values: [ SILENT, SATURATE, ERROR ]


With signed integers, the overflow cases involve a sign change.

# overflow: Examples demonstrating overflow behavior divide(-9223372036854775808::i64, -1::i64) [overflow:ERROR] = <!ERROR> divide(-128::i8, -1::i8) [overflow:SATURATE] = 127::i8

With unsigned integers, this case isn't applicable, and in general overflow shouldn't be possible so we can drop this option fully for all of the impls.

extensions/unsigned_integers.yaml

vbarua

Changes look good to me, thanks for working on this Kadin.

Looking at this has definitely made me notice some stuff in the existing arithmetic extension that could use improvements as well 🧹

yongchul · 2026-03-18T17:05:14Z

tests/coverage/visitor.py

+        # Type is "u!" + identifier, e.g., "u!u8"
+        type_str = "u!" + ctx.Identifier().getText().lower()


does this handle YAML dependency reference or that is out of scope in the test?

By yaml dependency reference, are you talking about this feature for extensions?

substrait/text/simple_extensions_schema.yaml

Lines 10 to 19 in 00bc3c2

dependencies:

# For reusing type classes and type variations from other extension files.

# The keys are namespace identifiers that you can then use as dot-separated

# prefix for type class and type variation names in functions and the base

# type class for variations. The values must be extension URNs, following

# the same format and conventions as those used in the proto plans.

type: object

patternProperties:

"^[a-zA-Z_\\$][a-zA-Z0-9_\\$]*$":

type: string

I don't think the test cases support dependency references, yet

substrait/tests/README.md

Lines 92 to 101 in 00bc3c2

### Spec

```

doc := <version>

<include>

(<dependency>)*

((<test_group>)?(<test_case>)+\n)+

version := ### SUBSTRAIT_SCALAR_TEST: <test_library_version>

include := ### SUBSTRAIT_INCLUDE: <uri>

dependency := ### SUBSTRAIT_DEPENDENCY: <uri>

yongchul

+0 using string for representation (because we go at length in defining representation in core types (e.g., decimal)) but overall looks good to me.

vbarua · 2026-03-25T01:02:20Z

@yongchul what does

+0 using string for representation ...

from you mean here? Is that an approval with no vote?

jacques-n · 2026-03-30T21:20:19Z

+0 using string for representation (because we go at length in defining representation in core types (e.g., decimal)) but overall looks good to me.

I'm more like a -0.1 for the string representation. why not use the corresponding same sized signed types OR fixed binary types (we have those, right?)? (asking for a friend)

vbarua · 2026-03-31T04:42:21Z

I'm more like a -0.1 for the string representation. why not use the corresponding same sized signed types OR fixed binary types (we have those, right?)? (asking for a friend)

There's a bit of tension here between a good representation for the test cases, and a good representation for actual systems.

For tests, I find it hard to read them when I have to convert the i8 into u8 values in my head. Like this is what a test for overflow would look like:

add(-1::u!u8, 1:u!u8)

What I want is to be able to use the string format in the tests, which makes them readable

add('255':u!u8, '1':u!u8)

and use the signed type encoding for actual systems. Right now we're piggy-backing off of the struct representation defined for the udt to define literals of udt's in the tests. I'm mulling some changes around this, either to allow for multiple struct encodings OR to define test encodings separately.

jacques-n · 2026-03-31T17:20:09Z

There's a bit of tension here between a good representation for the test cases...

Agree that -1 would suck for test cases. I don't have great ideas on how to resolve the mismatch. One other option I could see is to use the next integer size up for the smaller ones and decimal for the biggest one. then representation isn't ugly and always safe (e.g. a u8 always fits in a u16 so a system that doesn't understand u8 but understands u16 could work with it).

kadinrabo force-pushed the feat/unsigned-extension-types branch from e4584ca to daccb86 Compare January 30, 2026 15:36

feat(extensions): add unsigned integer extension types (u8, u16, u32,…

d19c87a

… u64)

kadinrabo force-pushed the feat/unsigned-extension-types branch from daccb86 to d19c87a Compare January 30, 2026 15:38

feat(extensions): add arithmetic function impls for unsigned types

a413f3e

kadinrabo closed this Jan 30, 2026

kadinrabo reopened this Jan 30, 2026

benbellick reviewed Jan 30, 2026

View reviewed changes

grammar/FuncTestCaseParser.g4 Show resolved Hide resolved

kadinrabo force-pushed the feat/unsigned-extension-types branch from 6d28b59 to 4af53f8 Compare January 30, 2026 20:24

kadinrabo closed this Jan 30, 2026

kadinrabo reopened this Jan 30, 2026

kadinrabo added 4 commits January 30, 2026 15:53

feat(tests): add UDT argument support in test framework

b4bd79b

chore: regenerate ANTLR parsers

719ddb5

feat(tests): add unsigned integer test cases

37a1a40

chore: update test counts and baseline

41cee40

kadinrabo force-pushed the feat/unsigned-extension-types branch from 4af53f8 to 7d6f7f9 Compare January 30, 2026 20:53

kadinrabo closed this Jan 30, 2026

kadinrabo marked this pull request as ready for review January 30, 2026 21:06

kadinrabo requested review from EpsilonPrime, cpcloud, jacques-n, vbarua, westonpace and yongchul as code owners January 30, 2026 21:06

kadinrabo reopened this Jan 30, 2026

kadinrabo marked this pull request as draft January 30, 2026 21:07

chore: add dependency on extension_types_numeric

10baf81

kadinrabo force-pushed the feat/unsigned-extension-types branch 3 times, most recently from 8650358 to 10baf81 Compare February 4, 2026 15:53

kadinrabo marked this pull request as ready for review February 4, 2026 15:59

kadinrabo added 2 commits February 27, 2026 13:37

revert gitignore change

7f577ce

add comment explaining extension file scanner change

4bec245

kadinrabo requested review from benbellick, vbarua and yongchul February 27, 2026 04:59

This was referenced Feb 27, 2026

feat(extensions): add fp16, decimal256, and duration extension types #978

Draft

feat(extensions): add large container and date_millis extension types #979

Draft

benbellick reviewed Feb 27, 2026

View reviewed changes

vbarua reviewed Feb 27, 2026

View reviewed changes

kadinrabo added 2 commits March 2, 2026 18:26

improve type descriptions and divide description

215fcee

add overflow and null handling test cases

cd6b2a8

vbarua reviewed Mar 5, 2026

View reviewed changes

kadinrabo added 2 commits March 5, 2026 12:38

remove overflow option from unsigned divide

fce64b0

remove modulus function and tests

04a8882

vbarua added the PMC Ready PRs ready for review by PMCs label Mar 6, 2026

vbarua approved these changes Mar 6, 2026

View reviewed changes

merge upstream/main, regenerate parser and baseline

7c4c54a

yongchul reviewed Mar 18, 2026

View reviewed changes

yongchul approved these changes Mar 18, 2026

View reviewed changes

merge upstream/main, fix UDT nullability and grammar ordering

3ac870f

	-
	name: "divide"
	description: >
	Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0).
	The `on_division_by_zero` option governs behavior in cases where y is 0. If the option is IEEE then
	the IEEE754 standard is followed: all values except +/-infinity return NaN and +/-infinity are unchanged.
	If the option is LIMIT then the result is +/-infinity in all cases.
	If either x or y are NaN then behavior will be governed by `on_domain_error`.
	If x and y are both +/-infinity, behavior will be governed by `on_domain_error`.
	impls:

	# division_type: Examples demonstrating truncate and floor division types
	modulus(8::i8, -3::i8) [division_type:TRUNCATE] = 2::i8
	modulus(8::i8, -3::i8) [division_type:FLOOR] = -1::i8

		# Type is "u!" + identifier, e.g., "u!u8"
		type_str = "u!" + ctx.Identifier().getText().lower()

	dependencies:
	# For reusing type classes and type variations from other extension files.
	# The keys are namespace identifiers that you can then use as dot-separated
	# prefix for type class and type variation names in functions and the base
	# type class for variations. The values must be extension URNs, following
	# the same format and conventions as those used in the proto plans.
	type: object
	patternProperties:
	"^[a-zA-Z_\\$][a-zA-Z0-9_\\$]*$":
	type: string

	### Spec

	```
	doc := <version>
	<include>
	(<dependency>)*
	((<test_group>)?(<test_case>)+\n)+
	version := ### SUBSTRAIT_SCALAR_TEST: <test_library_version>
	include := ### SUBSTRAIT_INCLUDE: <uri>
	dependency := ### SUBSTRAIT_DEPENDENCY: <uri>

Conversation

kadinrabo commented Jan 29, 2026 • edited by jacques-n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

benbellick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vbarua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vbarua left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vbarua left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongchul left a comment

Choose a reason for hiding this comment

Uh oh!

vbarua commented Mar 25, 2026

Uh oh!

jacques-n commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbarua commented Mar 31, 2026

Uh oh!

jacques-n commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kadinrabo commented Jan 29, 2026 •

edited by jacques-n

Loading

jacques-n commented Mar 30, 2026 •

edited

Loading