Skip to content

Commit 3b11990

Browse files
rwgkcopybara-github
authored andcommitted
Add pybind11_protobuf::check_unknown_fields::ExtensionsWithUnknownFieldsPolicy.
PiperOrigin-RevId: 595557800
1 parent 2f613ca commit 3b11990

File tree

6 files changed

+105
-29
lines changed

6 files changed

+105
-29
lines changed

README.md

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,18 @@ protobuf extensions
7878
are involved, a well-known pitfall is that extensions are silently moved
7979
to the `proto2::UnknownFieldSet` when a message is deserialized in C++,
8080
but the `cc_proto_library` for the extensions is not linked in. The root
81-
cause is an asymmetry in the handling of Python protos vs C++ protos: when
82-
a Python proto is deserialized, both the Python descriptor pool and the C++
83-
descriptor pool are inspected, but when a C++ proto is deserialized, only
81+
cause is an asymmetry in the handling of Python protos vs C++
82+
protos:
83+
when a Python proto is deserialized, both the Python descriptor pool and the
84+
C++ descriptor pool are inspected, but when a C++ proto is deserialized, only
8485
the C++ descriptor pool is inspected. Until this asymmetry is resolved, the
8586
`cc_proto_library` for all extensions involved must be added to the `deps` of
86-
the relevant `pybind_library` or `pybind_extension`, but this is sufficiently
87-
unobvious to be a setup for regular accidents, potentially with critical
87+
the relevant `pybind_library` or `pybind_extension`, or if this is impractial,
88+
`pybind11_protobuf::check_unknown_fields::ExtensionsWithUnknownFieldsPolicy::WeakEnableFallbackToSerializeParse`
89+
or `pybind11_protobuf::AllowUnknownFieldsFor` can be used.
90+
91+
The pitfall is sufficiently unobvious to be a setup for regular accidents,
92+
potentially with critical
8893
consequences.
8994

9095
To guard against the most common type of accident, native_proto_caster.h
@@ -97,14 +102,20 @@ in certain situations:
97102
* and the `proto2::UnknownFieldSet` for the message or any of its submessages
98103
is not empty.
99104

100-
`pybind11_protobuf::AllowUnknownFieldsFor` is an escape hatch for situations in
101-
which
105+
`pybind11_protobuf::check_unknown_fields::ExtensionsWithUnknownFieldsPolicy::WeakEnableFallbackToSerializeParse`
106+
is a **global** escape hatch trading off convenience and runtime overhead: the
107+
convenience is that it is not necessary to determine what `cc_proto_library`
108+
dependencies need to be added, the runtime overhead is that
109+
`SerializePartialToString`/`ParseFromString` is used for messages with unknown
110+
fields, instead of the much faster `CopyFrom`.
102111

103-
* unknown fields existed before the safety mechanism was
104-
introduced.
105-
* unknown fields are needed in the future.
112+
Another escape hatch is `pybind11_protobuf::AllowUnknownFieldsFor`, which
113+
simply disables the safety mechanism for **specific message types**, without
114+
a runtime overhead. This is useful for situations in which unknown fields
115+
are acceptable.
106116

107-
An example of a full error message (with lines breaks here for readability):
117+
An example of a full error message generated by the safety mechanism
118+
(with lines breaks here for readability):
108119

109120
```
110121
Proto Message of type pybind11.test.NestRepeated has an Unknown Field with
@@ -117,14 +128,13 @@ Only if there is no alternative to suppressing this error, use
117128
(Warning: suppressions may mask critical bugs.)
118129
```
119130

120-
The current implementation is a compromise solution, trading off simplicity
121-
of implementation, runtime performance, and precision. Generally, the runtime
122-
overhead is expected to be very small, but fields flagged as unknown may not
123-
necessarily be in extensions.
124-
Alerting developers of new code to unknown fields is assumed to be generally
125-
helpful, but the unknown fields detection is limited to messages with
126-
extensions, to avoid the runtime overhead for the presumably much more common
127-
case that no extensions are involved.
131+
Note that the current implementation of the safety mechanism is a compromise
132+
solution, trading off simplicity of implementation, runtime performance,
133+
and precision. Alerting developers of new code to unknown fields is assumed
134+
to be generally helpful, but the unknown fields detection is limited to
135+
messages with extensions, to avoid the runtime overhead for the presumably
136+
much more common case that no extensions are involved. Because of this,
137+
the runtime overhead for the safety mechanism is expected to be very small.
128138

129139
### Enumerations
130140

pybind11_protobuf/BUILD

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,15 @@ cc_library(
100100
"@com_google_protobuf//python:proto_api",
101101
],
102102
)
103+
104+
cc_library(
105+
name = "disallow_extensions_with_unknown_fields",
106+
srcs = ["disallow_extensions_with_unknown_fields.cc"],
107+
visibility = [
108+
"//visibility:public",
109+
],
110+
deps = [
111+
":check_unknown_fields",
112+
],
113+
alwayslink = 1,
114+
)

pybind11_protobuf/check_unknown_fields.cc

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -181,9 +181,9 @@ void AllowUnknownFieldsFor(absl::string_view top_message_descriptor_full_name,
181181
unknown_field_parent_message_fqn));
182182
}
183183

184-
std::optional<std::string> CheckAndBuildErrorMessageIfAny(
184+
std::optional<std::string> CheckRecursively(
185185
const ::google::protobuf::python::PyProto_API* py_proto_api,
186-
const ::google::protobuf::Message* message) {
186+
const ::google::protobuf::Message* message, bool build_error_message_if_any) {
187187
const auto* root_descriptor = message->GetDescriptor();
188188
HasUnknownFields search{py_proto_api, root_descriptor};
189189
if (!search.FindUnknownFieldsRecursive(message, 0u)) {
@@ -193,6 +193,9 @@ std::optional<std::string> CheckAndBuildErrorMessageIfAny(
193193
search.FieldFQN())) != 0) {
194194
return std::nullopt;
195195
}
196+
if (!build_error_message_if_any) {
197+
return ""; // This indicates that an unknown field was found.
198+
}
196199
return search.BuildErrorMessage();
197200
}
198201

pybind11_protobuf/check_unknown_fields.h

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,45 @@
99

1010
namespace pybind11_protobuf::check_unknown_fields {
1111

12+
class ExtensionsWithUnknownFieldsPolicy {
13+
enum State {
14+
// Initial state.
15+
kWeakDisallow,
16+
17+
// Primary use case: PyCLIF extensions might set this when being imported.
18+
kWeakEnableFallbackToSerializeParse,
19+
20+
// Primary use case: `:disallow_extensions_with_unknown_fields` in `deps`
21+
// of a binary (or test).
22+
kStrongDisallow
23+
};
24+
25+
static State& GetStateSingleton() {
26+
static State singleton = kWeakDisallow;
27+
return singleton;
28+
}
29+
30+
public:
31+
static void WeakEnableFallbackToSerializeParse() {
32+
State& policy = GetStateSingleton();
33+
if (policy == kWeakDisallow) {
34+
policy = kWeakEnableFallbackToSerializeParse;
35+
}
36+
}
37+
38+
static void StrongSetDisallow() { GetStateSingleton() = kStrongDisallow; }
39+
40+
static bool UnknownFieldsAreDisallowed() {
41+
return GetStateSingleton() != kWeakEnableFallbackToSerializeParse;
42+
}
43+
};
44+
1245
void AllowUnknownFieldsFor(absl::string_view top_message_descriptor_full_name,
1346
absl::string_view unknown_field_parent_message_fqn);
1447

15-
std::optional<std::string> CheckAndBuildErrorMessageIfAny(
48+
std::optional<std::string> CheckRecursively(
1649
const ::google::protobuf::python::PyProto_API* py_proto_api,
17-
const ::google::protobuf::Message* top_message);
50+
const ::google::protobuf::Message* top_message, bool build_error_message_if_any);
1851

1952
} // namespace pybind11_protobuf::check_unknown_fields
2053

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#include "pybind11_protobuf/check_unknown_fields.h"
2+
3+
namespace pybind11_protobuf::check_unknown_fields {
4+
namespace {
5+
6+
static int kSetConfigDone = []() {
7+
ExtensionsWithUnknownFieldsPolicy::StrongSetDisallow();
8+
return 0;
9+
}();
10+
11+
} // namespace
12+
} // namespace pybind11_protobuf::check_unknown_fields

pybind11_protobuf/proto_cast_util.cc

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -819,18 +819,24 @@ py::handle GenericProtoCast(Message* src, py::return_value_policy policy,
819819
// 1. The binary does not have a py_proto_api instance, or
820820
// 2. a) the proto is from the default pool and
821821
// b) the binary is not using fast_cpp_protos.
822-
if ((GlobalState::instance()->py_proto_api() == nullptr) ||
822+
if (GlobalState::instance()->py_proto_api() == nullptr ||
823823
(src->GetDescriptor()->file()->pool() ==
824824
DescriptorPool::generated_pool() &&
825825
!GlobalState::instance()->using_fast_cpp())) {
826826
return GenericPyProtoCast(src, policy, parent, is_const);
827827
}
828828

829-
std::optional<std::string> emsg =
830-
check_unknown_fields::CheckAndBuildErrorMessageIfAny(
831-
GlobalState::instance()->py_proto_api(), src);
832-
if (emsg) {
833-
throw py::value_error(*emsg);
829+
std::optional<std::string> unknown_field_message =
830+
check_unknown_fields::CheckRecursively(
831+
GlobalState::instance()->py_proto_api(), src,
832+
check_unknown_fields::ExtensionsWithUnknownFieldsPolicy::
833+
UnknownFieldsAreDisallowed());
834+
if (unknown_field_message) {
835+
if (!unknown_field_message->empty()) {
836+
throw py::value_error(*unknown_field_message);
837+
}
838+
// Fall back to serialize/parse.
839+
return GenericPyProtoCast(src, policy, parent, is_const);
834840
}
835841

836842
// If this is a dynamically generated proto, then we're going to need to

0 commit comments

Comments
 (0)