load performance slower than stdlib json, double that of orjson #261
Replies: 3 comments 4 replies
-
|
Not sure why you opened this as a bug and not a discussion, but whatever. I didn't write the C extension myself, it was contributed. If you can find some obvious reason for it being slower than expected, let me know. But CBOR supports a massively broader set of types which can partially explain some of the performance difference. I also assume that these other projects have more than one developer behind them. |
Beta Was this translation helpful? Give feedback.
-
|
@agronholm I thought that too, but FYI when you create an issue it only offers "Bug" or "Feature" as options, and Discussions aren't enabled on this repo. If you can enable them and convert that would be great. It looks like C extension came in on #51 ? Though I haven't 👀 C in a long, long time, so I doubt I could be of any assistance. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @agronholm, thanks for maintaining this library! We'd like to use it in pyvespa as we're adding CBOR support to Vespa, as an alternative to JSON for better performance. But I did note in my testing that it's slower than expected, which is how i found this issue. My analysis and testing shows that the main problem is the overhead from many small fp.read() calls: The C decoder calls fp.read() for every small piece of data needed during decoding (1-9 bytes for type headers, lengths, etc.). Each call crosses the C-to-Python boundary, creating Python objects (size int, result bytes) and incurring method call overhead. For documents with many values, this is a significant part of decode time. By doing fewer and larger reads (into an internal buffer), the performance improves (by 2X or more in my prototype implementation). We can create some PRs, if you are willing to review the work (we'll review it internally first). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Things to check first
I have searched the existing issues and didn't find my bug already reported there
I have checked that my bug is still present in the latest release
cbor2 version
5.6.5
Python version
3.12.2
What happened?
Per this I assumed CBOR would be faster than JSON, but my testing seems to indicate otherwise.
For a 5MB JSON file:
The
cbor_openuses ⬇️ , others useread_bytes.cbor2/docs/usage.rst
Lines 16 to 18 in 071a165
The file sizes are also closer than I'd imagine:
Am I missing something?
I'm able to
from _cbor2 import *withoutImportError, so I assume I am using the optimized C version?cbor2/cbor2/__init__.py
Lines 22 to 27 in 071a165
How can we reproduce the bug?
Beta Was this translation helpful? Give feedback.
All reactions