-
Notifications
You must be signed in to change notification settings - Fork 99
Assistant: Provide a way to "see" data objects #7114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some thoughts:
|
For tabular data (data frames), we could take advantage of the RPCs (schemas, raw values, summary stats, histograms, frequency tables, and so on) provided by the data explorer comm (but without opening a data explorer UI tab). I was just thinking today separately that it would make sense for the assistant to be able to access all of the statistical summaries displayed in the data explorer and compute statistics with filters applied, etc. |
I just opened #7299 as one thing to think about -- my initial thought was to add a stateless data querying API to the variables comm that provides an initial subset of capabilities that already exist within the data explorer comm, but this might come with drawbacks. The challenge I see is that it is currently difficult to use the data explorer comm directly without also opening the UI/editor tab. I'll keep thinking some more about this -- the assistant will need to be made aware about changes to datasets which will invalidate previous tool calls, so this is probably an argument to refactor things so that the assistant can open a data explorer comm without needing to open the UI tab. If the user tries to view a dataset that's already being examined by the assistant, we can simply open another data explorer comm for the same variable. |
This is my thinking atm. I'm not sure if we need any changes to the OpenRPC schema, or if this can be done in the IDE where we can directly reuse types. Maybe by decoupling Also worth keeping in mind that we may more generally want custom UIs in the assistant chat pane for tools that interact with comms, and headless comm clients could be useful for that too. |
If you ask Assistant about some data in your environment, especially if it is large, it will typically try to execute R or Python functions that return plain-text versions of the information.
While this kind of works, these tools (
summary
andstr
) do not format information in a way that is intended (or, often, even legible) to an LLM. For example, here's the summary the model asked for. It relies on whitespace formatting and has multiple columns, so it's difficult or impossible for the model to parse it.This problem was also observed by @jcheng5 when working with DataBot, which is why DataBot converts data to JSON before sending it to the model.
To provide Assistant with better tools for working with data, we should implement a tool that can give it information about a data set that is well-structured. Specifically:
The text was updated successfully, but these errors were encountered: