Summary
A 2-byte Substrait protobuf (12 00) fed to the public
arrow::engine::DeserializePlan API aborts the process. A
SimpleExtensionDeclaration whose mapping_type oneof is left unset falls
through to default: Unreachable() in GetExtensionSetFromMessage, which is an
unconditional std::abort(). The same root cause is reachable through
arrow::engine::DeserializeExpressions (via the ExtendedExpression entry). Any
service that deserializes an untrusted Substrait plan/expression can be crashed
(denial of service) by a trivially malformed input.
Tested at pinned commit 16fe34250a2ef261790b9cc414fdf0831669cf9f
(25.0.0-SNAPSHOT).
Root Cause
GetExtensionSetFromMessage<T> switches on the
SimpleExtensionDeclaration.mapping_type_case() oneof selector to pick which
extension kind the declaration carries. It handles the three defined arms
(kExtensionTypeVariation, kExtensionType, kExtensionFunction) and then
falls into a default: arm that calls arrow::Unreachable(). But the oneof also
has the unset state MAPPING_TYPE_NOT_SET (the protobuf default when a producer
simply omits the field), which is attacker-reachable and lands on exactly that
default: arm. Unreachable() is a hard std::abort() — the DCHECK is a
no-op under NDEBUG (release builds), so the only effect on untrusted input is a
process abort instead of a non-OK Status.
Vulnerable code (cpp/src/arrow/engine/substrait/util_internal.h:65-86):
std::unordered_map<uint32_t, Id> type_ids, function_ids;
for (const auto& ext : message.extensions()) {
switch (ext.mapping_type_case()) {
case substrait::extensions::SimpleExtensionDeclaration::kExtensionTypeVariation: {
return Status::NotImplemented("Type Variations are not yet implemented");
}
case substrait::extensions::SimpleExtensionDeclaration::kExtensionType: {
const auto& type = ext.extension_type();
std::string_view uri = uris[type.extension_uri_reference()];
type_ids[type.type_anchor()] = Id{uri, type.name()};
break;
}
case substrait::extensions::SimpleExtensionDeclaration::kExtensionFunction: {
const auto& fn = ext.extension_function();
std::string_view uri = uris[fn.extension_uri_reference()];
function_ids[fn.function_anchor()] = Id{uri, fn.name()};
break;
}
default:
Unreachable(); // <-- MAPPING_TYPE_NOT_SET reaches here -> std::abort()
}
}
Unreachable() is an unconditional abort (cpp/src/arrow/util/unreachable.cc:26-29):
[[noreturn]] void Unreachable(const char* message) {
DCHECK(false) << message; // no-op under NDEBUG
std::abort();
}
Two public entries reach the same site:
DeserializePlan (serde.cc:217) -> GetExtensionSetFromPlan
(plan_internal.cc:53) -> GetExtensionSetFromMessage<substrait::Plan>.
DeserializeExpressions (serde.cc:251) -> FromProto
(extended_expression_internal.cc:163) -> GetExtensionSetFromExtendedExpression
(extended_expression_internal.cc:38) ->
GetExtensionSetFromMessage<substrait::ExtendedExpression>.
PoC
2 bytes — a Plan with one empty SimpleExtensionDeclaration (field 2,
length-delimited, zero length). The empty declaration leaves mapping_type
unset, hitting default: Unreachable().
# generate_poc.py — re-create the crash input from bytes
poc = bytes([0x12, 0x00])
open("poc.bin", "wb").write(poc)
Crash input size: 2 bytes (poc/poc.bin, md5 5a68de997d60afa9083b17fe00f7cdf2).
The same root cause is also reachable via DeserializeExpressions with an
ExtendedExpression carrying an empty SimpleExtensionDeclaration.
Reproduction
Build Arrow C++ from source with -DARROW_SUBSTRAIT=ON and AddressSanitizer, then deserialize the
attached 2-byte Substrait plan through the public API:
#include <arrow/engine/substrait/serde.h>
// buf = {0x12, 0x00};
auto result = arrow::engine::DeserializePlan(*buf); // std::abort() here
A SimpleExtensionDeclaration with an unset mapping_type oneof (the protobuf default) falls through
switch (ext.mapping_type_case()) { ... default: Unreachable(); }, and arrow::Unreachable() calls
std::abort() (silent SIGABRT, exit 134):
GetExtensionSetFromMessage<...> util_internal.h
DeserializePlan serde.cc
The default: Unreachable() arm is still present in current master
(cpp/src/arrow/engine/substrait/util_internal.h:82). PoC: 2 bytes 12 00.
Suggested Fix
Replace the default: Unreachable() arm with a returned non-OK Status so the
unset/unknown oneof is rejected gracefully instead of aborting:
--- a/cpp/src/arrow/engine/substrait/util_internal.h
+++ b/cpp/src/arrow/engine/substrait/util_internal.h
@@
- default:
- Unreachable();
+ default:
+ return Status::Invalid(
+ "Invalid Substrait SimpleExtensionDeclaration: mapping_type is unset "
+ "or unknown (case ",
+ static_cast<int>(ext.mapping_type_case()), ")");
}
GetExtensionSetFromMessage already returns Result<ExtensionSet>, so the
Status::Invalid propagates cleanly back through DeserializePlan /
DeserializeExpressions as a non-OK result.
PoC bytes (self-contained)
The trigger input is 2 bytes (poc/poc.bin).
Recreate it exactly with:
base64 -d > poc.bin <<'B64'
EgA=
B64
Hex: 1200
Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).
Summary
A 2-byte Substrait protobuf (
12 00) fed to the publicarrow::engine::DeserializePlanAPI aborts the process. ASimpleExtensionDeclarationwhosemapping_typeoneof is left unset fallsthrough to
default: Unreachable()inGetExtensionSetFromMessage, which is anunconditional
std::abort(). The same root cause is reachable througharrow::engine::DeserializeExpressions(via theExtendedExpressionentry). Anyservice that deserializes an untrusted Substrait plan/expression can be crashed
(denial of service) by a trivially malformed input.
Tested at pinned commit
16fe34250a2ef261790b9cc414fdf0831669cf9f(25.0.0-SNAPSHOT).
Root Cause
GetExtensionSetFromMessage<T>switches on theSimpleExtensionDeclaration.mapping_type_case()oneof selector to pick whichextension kind the declaration carries. It handles the three defined arms
(
kExtensionTypeVariation,kExtensionType,kExtensionFunction) and thenfalls into a
default:arm that callsarrow::Unreachable(). But the oneof alsohas the unset state
MAPPING_TYPE_NOT_SET(the protobuf default when a producersimply omits the field), which is attacker-reachable and lands on exactly that
default:arm.Unreachable()is a hardstd::abort()— theDCHECKis ano-op under
NDEBUG(release builds), so the only effect on untrusted input is aprocess abort instead of a non-OK
Status.Vulnerable code (
cpp/src/arrow/engine/substrait/util_internal.h:65-86):Unreachable()is an unconditional abort (cpp/src/arrow/util/unreachable.cc:26-29):Two public entries reach the same site:
DeserializePlan(serde.cc:217) ->GetExtensionSetFromPlan(
plan_internal.cc:53) ->GetExtensionSetFromMessage<substrait::Plan>.DeserializeExpressions(serde.cc:251) ->FromProto(
extended_expression_internal.cc:163) ->GetExtensionSetFromExtendedExpression(
extended_expression_internal.cc:38) ->GetExtensionSetFromMessage<substrait::ExtendedExpression>.PoC
2 bytes — a
Planwith one emptySimpleExtensionDeclaration(field 2,length-delimited, zero length). The empty declaration leaves
mapping_typeunset, hitting
default: Unreachable().Crash input size: 2 bytes (
poc/poc.bin, md55a68de997d60afa9083b17fe00f7cdf2).The same root cause is also reachable via
DeserializeExpressionswith anExtendedExpressioncarrying an emptySimpleExtensionDeclaration.Reproduction
Build Arrow C++ from source with
-DARROW_SUBSTRAIT=ONand AddressSanitizer, then deserialize theattached 2-byte Substrait plan through the public API:
A
SimpleExtensionDeclarationwith an unsetmapping_typeoneof (the protobuf default) falls throughswitch (ext.mapping_type_case()) { ... default: Unreachable(); }, andarrow::Unreachable()callsstd::abort()(silent SIGABRT, exit 134):The
default: Unreachable()arm is still present in currentmaster(
cpp/src/arrow/engine/substrait/util_internal.h:82). PoC: 2 bytes12 00.Suggested Fix
Replace the
default: Unreachable()arm with a returned non-OKStatusso theunset/unknown oneof is rejected gracefully instead of aborting:
GetExtensionSetFromMessagealready returnsResult<ExtensionSet>, so theStatus::Invalidpropagates cleanly back throughDeserializePlan/DeserializeExpressionsas a non-OK result.PoC bytes (self-contained)
The trigger input is 2 bytes (
poc/poc.bin).Recreate it exactly with:
Hex:
1200Credit
Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).