You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -8,85 +8,170 @@ When sending messages over a network, you often need to marshall your data into
8
8
9
9
## Builtin serialization
10
10
11
-
PyZMQ is primarily bindings for libzmq, but we do provide three builtin serialization
11
+
PyZMQ is primarily bindings for libzmq, but we do provide some builtin serialization
12
12
methods for convenience, to help Python developers learn libzmq. Python has two primary
13
-
packages for serializing objects: {py:mod}`json` and {py:mod}`pickle`, so we provide
14
-
simple convenience methods for sending and receiving objects serialized with these
15
-
modules. A socket has the methods {meth}`~.Socket.send_json` and
13
+
modules for serializing objects in the standard library: {py:mod}`json` and {py:mod}`pickle`,
14
+
so pyzmq provides simple convenience methods for sending and receiving objects serialized with these modules.
15
+
A socket has the methods {meth}`~.Socket.send_json` and
16
16
{meth}`~.Socket.send_pyobj`, which correspond to sending an object over the wire after
17
17
serializing with json and pickle respectively, and any object sent via those
18
18
methods can be reconstructed with the {meth}`~.Socket.recv_json` and
19
19
{meth}`~.Socket.recv_pyobj` methods.
20
20
21
-
These methods designed for convenience, not for performance, so developers who want
22
-
to emphasize performance should use their own serialized send/recv methods.
21
+
```{note}
22
+
These methods are meant more for convenience and demonstration purposes, not for performance or safety.
23
+
Applications should usually define their own serialized send/recv functions.
24
+
```
25
+
26
+
```{warning}
27
+
`send/recv_pyobj` are very basic wrappers around `send(pickle.dumps(obj))` and `pickle.loads(recv())`.
28
+
That means calling `recv_pyobj` is explicitly trusting incoming messages with full arbitrary code execution.
29
+
Make sure you never use this if your sockets might receive untrusted messages.
30
+
You can protect your sockets by e.g.:
31
+
32
+
- enabling CURVE encryption/authentication, IPC socket permissions, or other socket-level security to prevent unauthorized messages in the first place, or
33
+
- using some kind of message authentication, such as HMAC digests, to verify trusted messages **before** deserializing
34
+
```
23
35
24
36
## Using your own serialization
25
37
26
38
In general, you will want to provide your own serialization that is optimized for your
27
-
application or library availability. This may include using your own preferred
28
-
serialization ([^cite_msgpack], [^cite_protobuf]), or adding compression via [^cite_zlib] in the standard
29
-
library, or the super fast [^cite_blosc] library.
39
+
application goals or library availability. This may include using your own preferred
40
+
serialization such as [msgpack] or [msgspec],
41
+
or adding compression via {py:mod}`zlib` in the standard library,
42
+
or the super fast [blosc] library.
43
+
44
+
```{warning}
45
+
If handling a message can _do_ things (especially if using something like pickle for serialization (which, _please_ don't if you can help it)).
46
+
Make sure you don't ever take action on a message without validating its origin.
47
+
With pickle/recv_pyobj, **deserializing itself counts as taking an action**
48
+
because it includes **arbitrary code execution**!
49
+
```
50
+
51
+
In ZeroMQ, a single message is one _or more_ "Frames" of bytes, which means you should think about serializing your messages not just to bytes, but also consider if _lists_ of bytes might fit best.
52
+
Multi-part messages allow for message serialization with a header of metadata without needing to make copies of potentially large message contents without losing atomicity of the message delivery.
53
+
54
+
To write your own serialization, you can either call `send` and `recv` methods directly on zmq sockets,
55
+
or you can make use of the {meth}`.Socket.send_serialized` / {meth}`.Socket.recv_serialized` methods.
56
+
I would strongly suggest starting with a function that turns a message (however your application defines it) into a sequence of sendable buffers, and the inverse function.
57
+
58
+
For example:
59
+
60
+
```python
61
+
socket.send_json(msg)
62
+
msg = socket.recv_json()
63
+
```
64
+
65
+
is equivalent to
66
+
67
+
```python
68
+
defjson_dump_bytes(msg: Any) -> list[bytes]:
69
+
return [json.dumps(msg).encode("utf8")]
30
70
31
-
There are two simple models for implementing your own serialization: write a function
32
-
that takes the socket as an argument, or subclass Socket for use in your own apps.
0 commit comments