Assistant: Provide a way to "see" data objects #7114

jmcphers · 2025-04-03T20:58:34Z

If you ask Assistant about some data in your environment, especially if it is large, it will typically try to execute R or Python functions that return plain-text versions of the information.

While this kind of works, these tools (summary and str) do not format information in a way that is intended (or, often, even legible) to an LLM. For example, here's the summary the model asked for. It relies on whitespace formatting and has multiple columns, so it's difficult or impossible for the model to parse it.

summary(diamonds)
     carat               cut        color        clarity     
 Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
 1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
 Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
 Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
 3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
 Max.   :5.0100                     I: 5422   VVS1   : 3655  
                                    J: 2808   (Other): 2531  
     depth           table           price             x         
 Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
 1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
 Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
 Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
 3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
 Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
                                                                 
       y                z         
 Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 4.720   1st Qu.: 2.910  
 Median : 5.710   Median : 3.530  
 Mean   : 5.735   Mean   : 3.539  
 3rd Qu.: 6.540   3rd Qu.: 4.040  
 Max.   :58.900   Max.   :31.800

This problem was also observed by @jcheng5 when working with DataBot, which is why DataBot converts data to JSON before sending it to the model.

To provide Assistant with better tools for working with data, we should implement a tool that can give it information about a data set that is well-structured. Specifically:

Unlike "execute code", the tool need not require confirmation since it is only reading information. This will allow the model to repeatedly look at data without pausing to ask the user to run code to see what the result looks like.
The tool should return structured data in JSON. Existing models do really well with this format.
Ideally, the tool should be usable to get a structured representation of any data type. (We might be able to use the existing variables comm?)
Ideally, the execute code tool could also emit structured JSON when the result of execution is a data frame/table, for the model to consume easily.

The text was updated successfully, but these errors were encountered:

seeM · 2025-04-04T06:09:18Z

Some thoughts:

We could implement this in the main thread following the example of EditTool if needed.
The mechanism for getting structured variable data could also be useful if/when we decouple language servers from kernels.
Many dataframe types include a text/html mime bundle in execute results that we could use for the execute code part.

wesm · 2025-04-15T19:56:18Z

For tabular data (data frames), we could take advantage of the RPCs (schemas, raw values, summary stats, histograms, frequency tables, and so on) provided by the data explorer comm (but without opening a data explorer UI tab). I was just thinking today separately that it would make sense for the assistant to be able to access all of the statistical summaries displayed in the data explorer and compute statistics with filters applied, etc.

wesm · 2025-04-17T22:14:31Z

I just opened #7299 as one thing to think about -- my initial thought was to add a stateless data querying API to the variables comm that provides an initial subset of capabilities that already exist within the data explorer comm, but this might come with drawbacks. The challenge I see is that it is currently difficult to use the data explorer comm directly without also opening the UI/editor tab. I'll keep thinking some more about this -- the assistant will need to be made aware about changes to datasets which will invalidate previous tool calls, so this is probably an argument to refactor things so that the assistant can open a data explorer comm without needing to open the UI tab. If the user tries to view a dataset that's already being examined by the assistant, we can simply open another data explorer comm for the same variable.

seeM · 2025-04-22T11:03:00Z

[...] this is probably an argument to refactor things so that the assistant can open a data explorer comm without needing to open the UI tab

This is my thinking atm. I'm not sure if we need any changes to the OpenRPC schema, or if this can be done in the IDE where we can directly reuse types. Maybe by decoupling DataExplorerClientInstance from the UI?

Also worth keeping in mind that we may more generally want custom UIs in the assistant chat pane for tools that interact with comms, and headless comm clients could be useful for that too.

jmcphers added the area: assistant Issues related to Positron Assistant label Apr 3, 2025

petetronic added this to the 2025.05.0 Pre-Release milestone Apr 7, 2025

petetronic assigned melissa-barca Apr 7, 2025

wesm mentioned this issue Apr 17, 2025

Allow for OpenRPC message components to be shared between comms #7299

Open

petetronic modified the milestones: 2025.05.0 Pre-Release, 2025.06.0 Pre-Release Apr 28, 2025

melissa-barca modified the milestones: 2025.06.0 Pre-Release, Release Candidate May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistant: Provide a way to "see" data objects #7114

Assistant: Provide a way to "see" data objects #7114

jmcphers commented Apr 3, 2025

seeM commented Apr 4, 2025

wesm commented Apr 15, 2025

wesm commented Apr 17, 2025

seeM commented Apr 22, 2025

Assistant: Provide a way to "see" data objects #7114

Assistant: Provide a way to "see" data objects #7114

Comments

jmcphers commented Apr 3, 2025

seeM commented Apr 4, 2025

wesm commented Apr 15, 2025

wesm commented Apr 17, 2025

seeM commented Apr 22, 2025