load performance slower than stdlib json, double that of orjson #261

davetapley · 2025-08-21T20:32:11Z

davetapley
Aug 21, 2025

Things to check first

I have searched the existing issues and didn't find my bug already reported there
I have checked that my bug is still present in the latest release

cbor2 version

5.6.5

Python version

3.12.2

What happened?

Per this I assumed CBOR would be faster than JSON, but my testing seems to indicate otherwise.

For a 5MB JSON file:

json 2.7858687019997888
orjson 1.316059184999176
cbor 3.216009937999843
cbor_open 3.6368825289991946

The cbor_open uses ⬇️ , others use read_bytes.

cbor2/docs/usage.rst

Lines 16 to 18 in 071a165

    
               # Efficiently deserialize from a file 
        
               with open('input.cbor', 'rb') as fp: 
        
                   obj = load(fp)

The file sizes are also closer than I'd imagine:

ll -h 5MB-min.*
-rw-rw-rw- 1 vscode vscode 4.2M Aug 21 20:19 5MB-min.cbor
-rw-rw-rw- 1 vscode vscode 4.5M Aug  6 10:49 5MB-min.json

Am I missing something?

I'm able to from _cbor2 import * without ImportError, so I assume I am using the optimized C version?

cbor2/cbor2/__init__.py

Lines 22 to 27 in 071a165

    
           try: 
        
               from _cbor2 import *  # noqa: F403 
        
           except ImportError: 
        
               # Couldn't import the optimized C version; ignore the failure and leave the 
        
               # pure Python implementations in place.

How can we reproduce the bug?

from timeit import timeit
import json
import orjson
import cbor2
from pathlib import Path


data = orjson.loads(Path('5MB-min.json').read_bytes())
Path('5MB-min.cbor').write_bytes(cbor2.dumps(data))


def load_json():
    path = Path('5MB-min.json')
    json.loads(path.read_bytes())


def load_orjson():
    path = Path('5MB-min.json')
    orjson.loads(path.read_bytes())


def load_cbor():
    path = Path('5MB-min.cbor')
    cbor2.loads(path.read_bytes())


def load_cbor_open():
    with open('5MB-min.cbor', 'rb') as fp:
        cbor2.load(fp)


print('json', timeit(load_json, number=100))
print('orjson', timeit(load_orjson, number=100))
print('cbor', timeit(load_cbor, number=100))
print('cbor_open', timeit(load_cbor_open, number=100))

agronholm · 2025-08-21T20:36:35Z

agronholm
Aug 21, 2025
Maintainer

Not sure why you opened this as a bug and not a discussion, but whatever. I didn't write the C extension myself, it was contributed. If you can find some obvious reason for it being slower than expected, let me know. But CBOR supports a massively broader set of types which can partially explain some of the performance difference. I also assume that these other projects have more than one developer behind them.

0 replies

davetapley · 2025-08-21T23:00:11Z

davetapley
Aug 21, 2025
Author

@agronholm I thought that too, but FYI when you create an issue it only offers "Bug" or "Feature" as options, and Discussions aren't enabled on this repo. If you can enable them and convert that would be great.

It looks like C extension came in on #51 ? Though I haven't 👀 C in a long, long time, so I doubt I could be of any assistance.

3 replies

agronholm Aug 22, 2025
Maintainer

You're right, in this repo I hadn't enabled discussions. My bad. I've done that now.

agronholm Aug 23, 2025
Maintainer

Yeah, so I'm not really sure what the other projects do differently that would help the performance. Optimizing this library is unfortunately an undertaking I'm not willing to devote time to as my attention is already stretched thin across my different projects.

davetapley Aug 23, 2025
Author

I know that feeling 😆

And no worries 💯

andreer · 2025-12-12T14:15:31Z

andreer
Dec 12, 2025

Hi @agronholm, thanks for maintaining this library!

We'd like to use it in pyvespa as we're adding CBOR support to Vespa, as an alternative to JSON for better performance. But I did note in my testing that it's slower than expected, which is how i found this issue.

My analysis and testing shows that the main problem is the overhead from many small fp.read() calls: The C decoder calls fp.read() for every small piece of data needed during decoding (1-9 bytes for type headers, lengths, etc.). Each call crosses the C-to-Python boundary, creating Python objects (size int, result bytes) and incurring method call overhead. For documents with many values, this is a significant part of decode time.

By doing fewer and larger reads (into an internal buffer), the performance improves (by 2X or more in my prototype implementation).

We can create some PRs, if you are willing to review the work (we'll review it internally first).

1 reply

agronholm Dec 12, 2025
Maintainer

Sounds good. This library is well overdue for a generic overhaul but I haven't had the time and energy to do that, and I have tons of other projects to maintain too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

load performance slower than stdlib json, double that of orjson #261

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

load performance slower than stdlib json, double that of orjson #261

Uh oh!

Uh oh!

davetapley Aug 21, 2025

Things to check first

cbor2 version

Python version

What happened?

How can we reproduce the bug?

Replies: 3 comments · 4 replies

Uh oh!

agronholm Aug 21, 2025 Maintainer

Uh oh!

davetapley Aug 21, 2025 Author

Uh oh!

agronholm Aug 22, 2025 Maintainer

Uh oh!

agronholm Aug 23, 2025 Maintainer

Uh oh!

davetapley Aug 23, 2025 Author

Uh oh!

Uh oh!

andreer Dec 12, 2025

Uh oh!

agronholm Dec 12, 2025 Maintainer

davetapley
Aug 21, 2025

Replies: 3 comments 4 replies

agronholm
Aug 21, 2025
Maintainer

davetapley
Aug 21, 2025
Author

agronholm Aug 22, 2025
Maintainer

agronholm Aug 23, 2025
Maintainer

davetapley Aug 23, 2025
Author

andreer
Dec 12, 2025

agronholm Dec 12, 2025
Maintainer