feat(extensions): add unsigned integer extension types (u8, u16, u32, u64)#953
feat(extensions): add unsigned integer extension types (u8, u16, u32, u64)#953kadinrabo wants to merge 24 commits intosubstrait-io:mainfrom
Conversation
e4584ca to
daccb86
Compare
daccb86 to
d19c87a
Compare
6d28b59 to
4af53f8
Compare
4af53f8 to
7d6f7f9
Compare
8650358 to
10baf81
Compare
benbellick
left a comment
There was a problem hiding this comment.
Thanks for working on this 🙂
Left a few more comments
extensions/unsigned_integers.yaml
Outdated
| - | ||
| name: "divide" | ||
| description: > | ||
| Divide x by y. Partial values are truncated (rounded towards 0). |
There was a problem hiding this comment.
We should include the divide description from
substrait/extensions/functions_arithmetic.yaml
Lines 177 to 186 in 2705258
There was a problem hiding this comment.
I can go with something like this, thoughts?
Divide x by y. Partial values are truncated (i.e. rounded towards 0).
The `on_division_by_zero` option governs behavior in cases where y is 0.
If either x or y are out of range, behavior will be governed by `on_domain_error`.
There was a problem hiding this comment.
Is it possible to have null cases added to these tests? Might be overkill but can't hurt.
There was a problem hiding this comment.
I'd prefer to keep parity with signed tests here, which don't have null cases for add/subtract/multiply either. Open to adding them if you feel strongly, but might be better as a follow up
There was a problem hiding this comment.
I'm okay with not doing this for now.
vbarua
left a comment
There was a problem hiding this comment.
This looks good overall to me.
The one thing I would like to see is that for the test files, let's map over not just the basic tests but also the overflow, null handling, etc tests that are applicable to usigned integers. I think the floating exception tests are the only ones that don't make apply.
extensions/unsigned_integers.yaml
Outdated
| - | ||
| name: "divide" | ||
| description: > | ||
| Divide x by y. Partial values are truncated (rounded towards 0). |
There was a problem hiding this comment.
We should include the divide description from
substrait/extensions/functions_arithmetic.yaml
Lines 177 to 186 in 2705258
vbarua
left a comment
There was a problem hiding this comment.
Left one comment on the definition of divide, and I think we should pull modulus out of this entirely because the existing definition could use a once over, but otherwise looks good to me.
The core set of functions you've added, along with the tests files, set a good example for how to add both future functions for unsigned integers, and also for adding new types.
extensions/unsigned_integers.yaml
Outdated
| division_type: | ||
| values: [ TRUNCATE, FLOOR ] | ||
| overflow: | ||
| values: [ SILENT, SATURATE, ERROR ] |
extensions/unsigned_integers.yaml
Outdated
| value: u!u8 | ||
| options: | ||
| division_type: | ||
| values: [ TRUNCATE, FLOOR ] |
There was a problem hiding this comment.
Is division_type an option we need here. The signed modulus operator defines 2 tests for them:
substrait/tests/cases/arithmetic/modulus.test
Lines 18 to 20 in e4ce3f8
There was a problem hiding this comment.
I'm okay with not doing this for now.
extensions/unsigned_integers.yaml
Outdated
| value: u!u8 | ||
| options: | ||
| overflow: | ||
| values: [ SILENT, SATURATE, ERROR ] |
There was a problem hiding this comment.
With signed integers, the overflow cases involve a sign change.
# overflow: Examples demonstrating overflow behavior
divide(-9223372036854775808::i64, -1::i64) [overflow:ERROR] = <!ERROR>
divide(-128::i8, -1::i8) [overflow:SATURATE] = 127::i8
With unsigned integers, this case isn't applicable, and in general overflow shouldn't be possible so we can drop this option fully for all of the impls.
vbarua
left a comment
There was a problem hiding this comment.
Changes look good to me, thanks for working on this Kadin.
Looking at this has definitely made me notice some stuff in the existing arithmetic extension that could use improvements as well 🧹
| # Type is "u!" + identifier, e.g., "u!u8" | ||
| type_str = "u!" + ctx.Identifier().getText().lower() |
There was a problem hiding this comment.
does this handle YAML dependency reference or that is out of scope in the test?
There was a problem hiding this comment.
By yaml dependency reference, are you talking about this feature for extensions?
substrait/text/simple_extensions_schema.yaml
Lines 10 to 19 in 00bc3c2
I don't think the test cases support dependency references, yet
Lines 92 to 101 in 00bc3c2
yongchul
left a comment
There was a problem hiding this comment.
+0 using string for representation (because we go at length in defining representation in core types (e.g., decimal)) but overall looks good to me.
|
@yongchul what does
from you mean here? Is that an approval with no vote? |
I'm more like a -0.1 for the string representation. why not use the corresponding same sized signed types OR fixed binary types (we have those, right?)? (asking for a friend) |
There's a bit of tension here between a good representation for the test cases, and a good representation for actual systems. For tests, I find it hard to read them when I have to convert the i8 into u8 values in my head. Like this is what a test for overflow would look like: What I want is to be able to use the string format in the tests, which makes them readable and use the signed type encoding for actual systems. Right now we're piggy-backing off of the struct representation defined for the udt to define literals of udt's in the tests. I'm mulling some changes around this, either to allow for multiple struct encodings OR to define test encodings separately. |
Agree that -1 would suck for test cases. I don't have great ideas on how to resolve the mismatch. One other option I could see is to use the next integer size up for the smaller ones and decimal for the biggest one. then representation isn't ugly and always safe (e.g. a u8 always fits in a u16 so a system that doesn't understand u8 but understands u16 could work with it). |
Description
Adds unsigned integer types (u8, u16, u32, u64) as first-class extension types with arithmetic function support and test coverage.
unsigned_integers.yamlwith type definitions (string structure encoding) and arithmetic function overloads (add, subtract, multiply, divide, modulus, sum, min, max)functions_arithmetic.yamlis untouchedtests/cases/arithmetic_unsigned/, following the arithmetic_decimal conventionCloses #944 and follows up community agreement from Substrait Meeting Notes on 28 Jan 2026 that type variations are not appropriate for unsigned integers due to differing semantics.
This change is