Skip to content

Commit 30e3189

Browse files
authored
Merge pull request #2039 from minrk/pyobj
[DOC] warn about and de-emphasize send/recv_pyobj
2 parents b084632 + f4e9f17 commit 30e3189

File tree

8 files changed

+292
-135
lines changed

8 files changed

+292
-135
lines changed

docs/source/api/zmq.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@
1010
## Basic Classes
1111

1212
````{note}
13-
For typing purposes, `zmq.Context` and `zmq.Socket` are Generics,
13+
For typing purposes, {class}`.zmq.Context` and {class}`.zmq.Socket` are Generics,
1414
which means they will accept any Context or Socket implementation.
1515
16-
The base `zmq.Context()` constructor returns the type
16+
The base {class}`zmq.Context()` constructor returns the type
1717
`zmq.Context[zmq.Socket[bytes]]`.
1818
If you are using type annotations and want to _exclude_ the async subclasses,
1919
use the resolved types instead of the base Generics:
@@ -32,7 +32,7 @@ sock: zmq.SyncSocket
3232
3333
````
3434

35-
### {class}`Context`
35+
## {class}`Context`
3636

3737
```{eval-rst}
3838
.. autoclass:: Context
@@ -47,7 +47,7 @@ sock: zmq.SyncSocket
4747
4848
```
4949

50-
### {class}`Socket`
50+
## {class}`Socket`
5151

5252
```{eval-rst}
5353
.. autoclass:: Socket
@@ -81,7 +81,7 @@ sock: zmq.SyncSocket
8181
8282
```
8383

84-
### {class}`Frame`
84+
## {class}`Frame`
8585

8686
```{eval-rst}
8787
.. autoclass:: Frame
@@ -90,7 +90,7 @@ sock: zmq.SyncSocket
9090
9191
```
9292

93-
### {class}`MessageTracker`
93+
## {class}`MessageTracker`
9494

9595
```{eval-rst}
9696
.. autoclass:: MessageTracker
@@ -99,9 +99,7 @@ sock: zmq.SyncSocket
9999
100100
```
101101

102-
## Polling
103-
104-
### {class}`Poller`
102+
## {class}`Poller`
105103

106104
```{eval-rst}
107105
.. autoclass:: Poller

docs/source/howto/morethanbindings.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,11 +122,24 @@ as first-class methods to the {class}`~.zmq.Socket` class. A socket has the meth
122122
{meth}`~.zmq.Socket.send_json` and {meth}`~.zmq.Socket.send_pyobj`, which correspond to sending an
123123
object over the wire after serializing with {mod}`json` and {mod}`pickle` respectively,
124124
and any object sent via those methods can be reconstructed with the
125-
{meth}`~.zmq.Socket.recv_json` and {meth}`~.zmq.Socket.recv_pyobj` methods. Unicode strings are
126-
other objects that are not unambiguously sendable over the wire, so we include
127-
{meth}`~.zmq.Socket.send_string` and {meth}`~.zmq.Socket.recv_string` that simply send bytes
125+
{meth}`~.zmq.Socket.recv_json` and {meth}`~.zmq.Socket.recv_pyobj` methods.
126+
127+
```{warning}
128+
Deserializing with pickle grants the message sender access to arbitrary code execution on the receiver.
129+
Never use `recv_pyobj` on a socket that might receive messages from untrusted sources
130+
before authenticating the sender.
131+
132+
It's always a good idea to enable CURVE security if you can,
133+
or authenticate messages with e.g. HMAC digests or other signing mechanisms.
134+
```
135+
136+
Text strings are other objects that are not unambiguously sendable over the wire, so we include
137+
{meth}`~.zmq.Socket.send_string` and {meth}`~.zmq.Socket.recv_string` that send bytes
128138
after encoding the message ('utf-8' is the default).
129139

140+
These are all convenience methods, and users are encouraged to build their own serialization that best suits their applications needs,
141+
especially concerning performance and security.
142+
130143
```{seealso}
131144
- {ref}`Further information <serialization>` on serialization in pyzmq.
132145
```

docs/source/howto/serialization.md

Lines changed: 118 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -8,85 +8,170 @@ When sending messages over a network, you often need to marshall your data into
88

99
## Builtin serialization
1010

11-
PyZMQ is primarily bindings for libzmq, but we do provide three builtin serialization
11+
PyZMQ is primarily bindings for libzmq, but we do provide some builtin serialization
1212
methods for convenience, to help Python developers learn libzmq. Python has two primary
13-
packages for serializing objects: {py:mod}`json` and {py:mod}`pickle`, so we provide
14-
simple convenience methods for sending and receiving objects serialized with these
15-
modules. A socket has the methods {meth}`~.Socket.send_json` and
13+
modules for serializing objects in the standard library: {py:mod}`json` and {py:mod}`pickle`,
14+
so pyzmq provides simple convenience methods for sending and receiving objects serialized with these modules.
15+
A socket has the methods {meth}`~.Socket.send_json` and
1616
{meth}`~.Socket.send_pyobj`, which correspond to sending an object over the wire after
1717
serializing with json and pickle respectively, and any object sent via those
1818
methods can be reconstructed with the {meth}`~.Socket.recv_json` and
1919
{meth}`~.Socket.recv_pyobj` methods.
2020

21-
These methods designed for convenience, not for performance, so developers who want
22-
to emphasize performance should use their own serialized send/recv methods.
21+
```{note}
22+
These methods are meant more for convenience and demonstration purposes, not for performance or safety.
23+
Applications should usually define their own serialized send/recv functions.
24+
```
25+
26+
```{warning}
27+
`send/recv_pyobj` are very basic wrappers around `send(pickle.dumps(obj))` and `pickle.loads(recv())`.
28+
That means calling `recv_pyobj` is explicitly trusting incoming messages with full arbitrary code execution.
29+
Make sure you never use this if your sockets might receive untrusted messages.
30+
You can protect your sockets by e.g.:
31+
32+
- enabling CURVE encryption/authentication, IPC socket permissions, or other socket-level security to prevent unauthorized messages in the first place, or
33+
- using some kind of message authentication, such as HMAC digests, to verify trusted messages **before** deserializing
34+
```
2335

2436
## Using your own serialization
2537

2638
In general, you will want to provide your own serialization that is optimized for your
27-
application or library availability. This may include using your own preferred
28-
serialization ([^cite_msgpack], [^cite_protobuf]), or adding compression via [^cite_zlib] in the standard
29-
library, or the super fast [^cite_blosc] library.
39+
application goals or library availability. This may include using your own preferred
40+
serialization such as [msgpack] or [msgspec],
41+
or adding compression via {py:mod}`zlib` in the standard library,
42+
or the super fast [blosc] library.
43+
44+
```{warning}
45+
If handling a message can _do_ things (especially if using something like pickle for serialization (which, _please_ don't if you can help it)).
46+
Make sure you don't ever take action on a message without validating its origin.
47+
With pickle/recv_pyobj, **deserializing itself counts as taking an action**
48+
because it includes **arbitrary code execution**!
49+
```
50+
51+
In ZeroMQ, a single message is one _or more_ "Frames" of bytes, which means you should think about serializing your messages not just to bytes, but also consider if _lists_ of bytes might fit best.
52+
Multi-part messages allow for message serialization with a header of metadata without needing to make copies of potentially large message contents without losing atomicity of the message delivery.
53+
54+
To write your own serialization, you can either call `send` and `recv` methods directly on zmq sockets,
55+
or you can make use of the {meth}`.Socket.send_serialized` / {meth}`.Socket.recv_serialized` methods.
56+
I would strongly suggest starting with a function that turns a message (however your application defines it) into a sequence of sendable buffers, and the inverse function.
57+
58+
For example:
59+
60+
```python
61+
socket.send_json(msg)
62+
msg = socket.recv_json()
63+
```
64+
65+
is equivalent to
66+
67+
```python
68+
def json_dump_bytes(msg: Any) -> list[bytes]:
69+
return [json.dumps(msg).encode("utf8")]
3070

31-
There are two simple models for implementing your own serialization: write a function
32-
that takes the socket as an argument, or subclass Socket for use in your own apps.
71+
72+
def json_load_bytes(msg_list: list[bytes]) -> Any:
73+
return json.loads(msg_list[0].decode("utf8"))
74+
75+
76+
socket.send_multipart(json_dump_bytes(msg))
77+
msg = json_load_bytes(socket.recv_multipart())
78+
# or
79+
socket.send_serialized(msg, serialize=json_dump_bytes)
80+
msg = socket.recv_serialized(json_load_bytes)
81+
```
82+
83+
### Example: pickling Python objects
84+
85+
As an example, pickle is Python's powerful built-in serialization for arbitrary Python objects.
86+
Two potential issues you might face:
87+
88+
1. sometimes it is inefficient, and
89+
1. `pickle.loads` enables arbitrary code execution
3390

3491
For instance, pickles can often be reduced substantially in size by compressing the data.
35-
The following will send *compressed* pickles over the wire:
92+
We also want to make sure we don't call `pickle.loads` on any untrusted messages.
93+
The following will send *compressed* pickles over the wire,
94+
and uses HMAC digests to verify that the sender has access to a shared secret key,
95+
indicating the message came from a trusted source.
3696

3797
```python
98+
import haslib
99+
import hmac
38100
import pickle
39101
import zlib
40102

41103

42-
def send_zipped_pickle(socket, obj, flags=0, protocol=pickle.HIGHEST_PROTOCOL):
43-
"""pickle an object, and zip the pickle before sending it"""
104+
def sign(self, key: bytes, msg: bytes) -> bytes:
105+
"""Compute the HMAC digest of msg, given signing key `key`"""
106+
return hmac.HMAC(
107+
key,
108+
msg,
109+
digestmod=hashlib.sha256,
110+
).digest()
111+
112+
113+
def send_signed_zipped_pickle(
114+
socket, obj, flags=0, *, key, protocol=pickle.HIGHEST_PROTOCOL
115+
):
116+
"""pickle an object, zip and sign the pickled bytes before sending"""
44117
p = pickle.dumps(obj, protocol)
45118
z = zlib.compress(p)
46-
return socket.send(z, flags=flags)
119+
signature = sign(key, zobj)
120+
return socket.send_multipart([signature, z], flags=flags)
47121

48122

49-
def recv_zipped_pickle(socket, flags=0):
50-
"""inverse of send_zipped_pickle"""
51-
z = socket.recv(flags)
123+
def recv_signed_zipped_pickle(socket, flags=0, *, key):
124+
"""inverse of send_signed_zipped_pickle"""
125+
sig, z = socket.recv_multipart(flags)
126+
# check signature before deserializing
127+
correct_signature = sign(key, z)
128+
if not hmac.compare_digest(sig, correct_signature):
129+
raise ValueError("invalid signature")
52130
p = zlib.decompress(z)
53131
return pickle.loads(p)
54132
```
55133

134+
### Example: numpy arrays
135+
56136
A common data structure in Python is the numpy array. PyZMQ supports sending
57137
numpy arrays without copying any data, since they provide the Python buffer interface.
58-
However just the buffer is not enough information to reconstruct the array on the
59-
receiving side. Here is an example of a send/recv that allow non-copying
138+
However, just the buffer is not enough information to reconstruct the array on the
139+
receiving side because it arrives as a 1-D array of bytes.
140+
You need just a little more information than that: the shape and the dtype.
141+
142+
Here is an example of a send/recv that allow non-copying
60143
sends/recvs of numpy arrays including the dtype/shape data necessary for reconstructing
61144
the array.
145+
This example makes use of multipart messages to serialize the header with JSON
146+
so the array data (which may be large!) doesn't need any unnecessary copies.
62147

63148
```python
64149
import numpy
65150

66151

67-
def send_array(socket, A, flags=0, copy=True, track=False):
152+
def send_array(
153+
socket: zmq.Socket,
154+
A: numpy.ndarray,
155+
flags: int = 0,
156+
**kwargs,
157+
):
68158
"""send a numpy array with metadata"""
69159
md = dict(
70160
dtype=str(A.dtype),
71161
shape=A.shape,
72162
)
73163
socket.send_json(md, flags | zmq.SNDMORE)
74-
return socket.send(A, flags, copy=copy, track=track)
164+
return socket.send(A, flags, **kwargs)
75165

76166

77-
def recv_array(socket, flags=0, copy=True, track=False):
167+
def recv_array(socket: zmq.Socket, flags: int = 0, **kwargs) -> numpy.array:
78168
"""recv a numpy array"""
79169
md = socket.recv_json(flags=flags)
80-
msg = socket.recv(flags=flags, copy=copy, track=track)
81-
buf = memoryview(msg)
82-
A = numpy.frombuffer(buf, dtype=md["dtype"])
170+
msg = socket.recv(flags=flags, **kwargs)
171+
A = numpy.frombuffer(msg, dtype=md["dtype"])
83172
return A.reshape(md["shape"])
84173
```
85174

86-
[^cite_msgpack]: Message Pack serialization library <https://msgpack.org>
87-
88-
[^cite_protobuf]: Google Protocol Buffers <https://github.com/protocolbuffers/protobuf>
89-
90-
[^cite_zlib]: Python stdlib module for zip compression: {py:mod}`zlib`
91-
92-
[^cite_blosc]: Blosc: A blocking, shuffling and loss-less (and crazy-fast) compression library <https://www.blosc.org>
175+
[blosc]: https://www.blosc.org
176+
[msgpack]: https://msgpack.org
177+
[msgspec]: https://jcristharif.com/msgspec/

examples/gevent/simple.py

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from typing import Optional
1+
from __future__ import annotations
22

33
from gevent import spawn, spawn_later
44

@@ -10,13 +10,13 @@
1010
sock = ctx.socket(zmq.PUSH)
1111
sock.bind('ipc:///tmp/zmqtest')
1212

13-
spawn(sock.send_pyobj, ('this', 'is', 'a', 'python', 'tuple'))
14-
spawn_later(1, sock.send_pyobj, {'hi': 1234})
13+
spawn(sock.send_json, ['this', 'is', 'a', 'list'])
14+
spawn_later(1, sock.send_json, {'hi': 1234})
1515
spawn_later(
16-
2, sock.send_pyobj, ({'this': ['is a more complicated object', ':)']}, 42, 42, 42)
16+
2, sock.send_json, ({'this': ['is a more complicated object', ':)']}, 42, 42, 42)
1717
)
18-
spawn_later(3, sock.send_pyobj, 'foobar')
19-
spawn_later(4, sock.send_pyobj, 'quit')
18+
spawn_later(3, sock.send_json, 'foobar')
19+
spawn_later(4, sock.send_json, 'quit')
2020

2121

2222
# client
@@ -27,14 +27,14 @@
2727

2828
def get_objs(sock: zmq.Socket):
2929
while True:
30-
o = sock.recv_pyobj()
31-
print('received python object:', o)
30+
o = sock.recv_json()
31+
print('received:', o)
3232
if o == 'quit':
3333
print('exiting.')
3434
break
3535

3636

37-
def print_every(s: str, t: Optional[float] = None):
37+
def print_every(s: str, t: float | None = None):
3838
print(s)
3939
if t:
4040
spawn_later(t, print_every, s, t)

0 commit comments

Comments
 (0)