The sdks that AI Bridge currently uses to interact with LLM clients and servers makes some trade-offs that work well for clients, but not for a gateway.
To scale bridge, we need to reduce the CPU and memory footprint of JSON handling in bridge.
See here for more information.
Pull requests already exist to address some of these issues:
Both of these also need to be applied to OpenAI.