Expose jsonish parser to language bindings #998

antoniomdk · 2024-10-01T00:36:21Z

In some cases, the BAML DSL can add some friction in adoption, for example, lack of tooling (IDE integrations, linters, etc.), plus it requires using a custom LLM client.

I think the key differentiator of BAML is the flexible parser + typescript-ish schema generator. If we can get access to both tools as Python/TS functions, it will open the doors for many integrations that rely on commonly used LLM clients like the OpenAI SDK.

The current bindings already expose primitives to generate BAML types from code, so we could have a function that takes the type object + LLM response and returns a parsed object. Another function that generates the schema representation from the types object is required to pass it as part of the prompt.

Note: forgive me if some of this functionality is already exposed, I couldn't find it in the codebase.

hellovai · 2024-10-09T23:11:19Z

Thanks for the request @antoniomdk as of right now we don't expose such a parser as we aren't sure what that surface area increase means for the language long term. I'll check and see how we could expose things (however, this gets a bit more complicated for some new features we are working on which don't translate well to python / ts).

Do you have specific tooling you're looking for when it comes down to IDE integrations or linters?

antoniomdk · 2024-10-10T09:59:17Z

Thanks for the reply!

I think in general, the two-step process of generating the client and then integrating it with the code can introduce some challenges. For example, as BAML provides a custom language, custom IDE integrations are needed (I know there's a vscode integration, but Pycharm/Webstorm are very common too). Also, I think BAML could work very well with Langchain for agents that need to call tools, but keeping the client in sync with the rest of the code requires custom CI pipelines or build scripts. Finally, if we were able to use the OpenAI SDK we could integrate with other tools, for example, OpenTelemetry for observability.

I've been taking a look at the jsonish code and I understand how it is tied to the rest of BAML primitives and why it'd be hard to expose it as it is. That's why I was suggesting maybe we can use the dynamic types primitives exposed to the language bindings to build a BAML primitive that we can pass to the jsonish module along with the raw response from the LLM.

I am not aware of the new features and how they will translate into Py/TS, so please let me know if I'm not in the right direction here.

On a separate note, the BAML language resembles TypeScript a lot, and I have seen many people doing wild stuff with tsc transformers and the TS type system (e.g Typia, Arktype, ts-macro, etc). I was thinking that if BAML programs could be created from TS code via tsc, that would open a big ecosystem for this project.

lukeramsden · 2024-10-21T13:49:24Z

To chime in with my usecase - integrating with the rest of the system would be a lot easier if we could use BAML as a library where we can generate the output to send to the LLM + parse the response separately to the actual calling of the provider API. We have concerns around routing to different models (see tools like NotDiamond), metering usage, observability/tracing, + others that are more application specific, that currently require some fairly un-fun hacks to get working with BAML (essentially creating an in-process proxy server that hooks in to the OpenAI-generic request/response format to do some of these things).

sanguivore-easyco · 2024-12-19T17:39:27Z

My case is similar to @lukeramsden above, where I want to use the jsonish parser from a context where I want more dynamic control over the exact LLM calls and just want to call the parser with the relevant strings. Ideally, I'd be able to access this from a JVM (I'm using Clojure). And I'd be happy to have an interface that returns something like JsonObject or something similarly not bound to language-specific idiomatic typing (happy to bind from a dynamic interface to my more fine-grained types in my own code if need be), even if I need to supply some sort of schema to the parser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose jsonish parser to language bindings #998

Expose jsonish parser to language bindings #998

antoniomdk commented Oct 1, 2024

hellovai commented Oct 9, 2024

antoniomdk commented Oct 10, 2024

lukeramsden commented Oct 21, 2024

sanguivore-easyco commented Dec 19, 2024

Expose jsonish parser to language bindings #998

Expose jsonish parser to language bindings #998

Comments

antoniomdk commented Oct 1, 2024

hellovai commented Oct 9, 2024

antoniomdk commented Oct 10, 2024

lukeramsden commented Oct 21, 2024

sanguivore-easyco commented Dec 19, 2024