AI Integration Next Steps #975

pdelboca · 2025-07-24T07:48:49Z

pdelboca
Jul 24, 2025
Maintainer

Local LLMs have been already integrated into main branch and the beta feature will be released soon. Since the technical integration has been completed, it is time to start thinking about performance, quality, use cases, etc.

Context

The current setup is using llama.cpp server so we have the flexibility to use any model we want as well as customize it.

The current landscape of models and use cases is huge so there are a couple of questions that requires research or consulting from someone expert in the field. The goal of the integration is to explore LLM capabilities when working with data and metadata or in other works How useful are LLMs whe working with data?.

Use cases/Hypothesis

Can LLMs help non-technical people in understanding the meaning of the columns and it's contents?
Can LLMs assist the user in generating interesting questions to ask the data?
Can LLMs help designing validation checks and logic-based rules to improve the quality of a dataset?
Can LLMs suggest best practices for naming columns?
Can LLMs suggest best practices for metadata definition?
Can LLMs help detecting errors or inconsistency in the data?
Given summary statistics of the data (mean, median, std, outliers, etc) can LLMs help translating those concepts to non-technical users?

Some questions or unknowns:

Which models are more suitable for the tasks we have in mind?
How can we improve the response time of the model?
Which parameters should we considering when loading the model?
How confident can we be around the response of the model? ("Hallucinations")
Are well known tradeoffs when selecting a model that we can explain to users?
If we pre-select 3 models for users to be able to use: which ones will be and why?

ivbeg · 2025-08-23T13:01:03Z

ivbeg
Aug 23, 2025

Hi! Thanks for kicking off this discussion. As a user, I can say unequivocally that flexibility in how the AI features are implemented is most important. For someone with unlimited RAM and CPU resources, using llama.cpp might be perfect, but I know plenty of situations where you’d want the option to use cloud-based LLMs or LLMs deployed on a local network. A private mode can (and perhaps should) be the default, but all the other options also deserve to exist.

5 replies

loleg Aug 26, 2025

Recommendations of which approach to use? For inspiration, an interesting paper/lightweight model for data processing: amazon.science. We should also pull in experience from other developers across the network (my own early implementation notes are here).

In regards to "AI Integration": I would like to see a comment on the industry dominance of the openai libraries, and what API spec alternatives they would consider. The Frictionless Data community could provide an excellent venue for this discussion. So I've started a thread here: frictionlessdata/datapackage#1116

ivbeg Aug 26, 2025

@loleg AI extension for OpenRefine provides a practical approach (http://github.com/sunilnatraj/llm-extension with an option to configure providers.

loleg Nov 6, 2025

The query-assistant for Datasette is going in a similar direction. I haven't managed to get it working with any local providers, though their llm tool works quite well for me in general.

loleg Nov 6, 2025

A recent blog post on huggingface pointed me to the interesting aisheets project - very alpha still, but feels already quite close to what ODE is trying to do. I was able to connect it to my local and remote inference utility, though with still rather limited effect (screenshot)

ivbeg Nov 6, 2025

Yeah, query assistant looks great since Datasette allows to use almost any AI model. For our data engineering tasks we use Ollama as a server in the local network and works much better than any local-only solution.

g00d-sec · 2025-08-25T14:57:58Z

g00d-sec
Aug 25, 2025

I’m thinking model risk assessment could be integrated.

It seems, on a personal level, that many (not necessarily most and probably not all) current deployments lack what should be appropriate safety / security / risk averse controls to safeguard end-users.

How does or will this infrastructure be implemented with risk in mind?

I could possibly ask this and other questions associated with that subject.

0 replies

ingokeck · 2025-08-26T08:38:57Z

ingokeck
Aug 26, 2025

Which models are more suitable for the tasks we have in mind?

The model space is constantly evolving. Best would be to define a set of test cases and evaluate them regularly. Best: Make test cases such that they are fed with random data each time so no one can train their model to one specific test set. Then let users evaluate their models voluntarily and send in their evaluations with a click of a button. Take care to get the model version and setup. This way you will quickly build up a database. Publish it on github. If it gets featured as standard evaluation this will get more exposure to the ODE.

How can we improve the response time of the model?

Better hardware, caching of previous results if they are correct. In general: Prefer small and specialised models.
Most of the time, running a model locally may not be the best decision. Therefore also provide access to commercial service providers via their API.

Which parameters should we considering when loading the model?

What are you going to achieve with this question?

How confident can we be around the response of the model? ("Hallucinations")

Hallucinations in LLM are not a bug, they are a feature. Do not use LLM to analyse data directly. Data extraction is possible but needs to be cross checked. The usual way to go is to use the LLM to understand what the user wants and to propose solutions. The actual data wrangling needs to be done by deterministic algorithms that can be checked for correctness. -> You may want to offer fine tuned small local LLM for that.

Are well known tradeoffs when selecting a model that we can explain to users?

I guess next to quality of response mostly response time and costs.

If we pre-select 3 models for users to be able to use: which ones will be and why?

First get a good overview on the model performance (See question 1). Then this question maybe has a solution.

If you need professional help with any of these topics for implementation, I know a decent company.

0 replies

ivbeg · 2025-08-26T19:07:26Z

ivbeg
Aug 26, 2025

Some thoughts about these questions

Can LLMs help non-technical people in understanding the meaning of the columns and it's contents?

Yes, if LLMs prompting is behind the scenes. Data prompting require skills much above non-technical people. Non-technical people should be actively assisted by application. There are typical operations that could/should/would be implemented to help them, but they shoud use it with predefined prompts that return structured outputs. Modern LLMs help a lot to analyze columns names, auto-generate documentations on data, detect semantic types and e.t.c

Can LLMs assist the user in generating interesting questions to ask the data?

Yes, but it's very expensive to implement it in local only mode. It may require iterative agentic approach. There are several startups that implement such assistance but almost all of them use cloud-based or other network based LLMs.

Can LLMs help designing validation checks and logic-based rules to improve the quality of a dataset?

Yes, but LLMs are secondary to rule based validation. Tools like Great expecations or Soda data quality are examples of the data validation frameworks without AI. LLM could generate these rules to speed-up rules definitions.

Can LLMs suggest best practices for naming columns?

It could be done even without LLMs by providing proper guide. LLMs could be used to validate names or predefined rules, but it's not nessesary. Anyway it's not about prompting by user, it's about predefined prompts and buttons like "Suggest column names using LLM" or similar.

Can LLMs suggest best practices for metadata definition?

Sure, but these best practices should be defined first

Can LLMs help detecting errors or inconsistency in the data?

It's a next step after detection of errors aand inconsistency using less resource consuming methods.

Given summary statistics of the data (mean, median, std, outliers, etc) can LLMs help translating those concepts to non-technical users?

User AI and data and metadata education is not the same as a tool that helps to impove metadata and data quality.

0 replies

Growth-On-Demand · 2025-08-27T06:31:58Z

Growth-On-Demand
Aug 27, 2025

Which models are more suitable for the tasks we have in mind? - The ones which are purposefully trained for that.
How can we improve the response time of the model? - LLM, interaction, training.
Which parameters should we considering when loading the model? - Localisation, advanced data analytics (separate data set), decentralisation, connectivity
How confident can we be around the response of the model? ("Hallucinations") - Depends on the maturity and level of user adoption. 99,98% after model adoption/config
Are well known tradeoffs when selecting a model that we can explain to users? - None
If we pre-select 3 models for users to be able to use: which ones will be and why? - All of them according to parameters

0 replies

ingokeck · 2025-11-06T15:26:03Z

ingokeck
Nov 6, 2025

3 months later and things move quickly. Did you have a look at the IBM Granit models lately? They feel like GPT3 but run on a few GB locally. I don't need a model that has tons of general knowledge. I need 1 model that understands text, 1 model that can code python and 1 that analyses images. Each one just needs a few GB and could run locally. The first one can call the others and hand over the problem.

Also, I recently discovered https://github.com/sst/opencode and I am impressed. Yes, it is in the terminal and the selling point is code. But that is far from all. You can add models that are not coding-specific and then have them munch through your text data with ease. You can use coding models (grok code 1 fast that comes with it at the moment for free is pretty decent, but you can add a different if you like) to let them write python scripts that analyse your data. Just follow test driven development working with it and it is such a powerful tool. You can define assistents ("agents") with markdown that will take care of the work. And everything is in the directory you work from, in markdown text.

ODE could build on that. Add it as backend. Allow it to call ODE as tool and vice versa.

0 replies

AI Integration Next Steps #975

Uh oh!

Uh oh!

pdelboca Jul 24, 2025 Maintainer

Context

Use cases/Hypothesis

Some questions or unknowns:

Replies: 6 comments · 5 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pdelboca
Jul 24, 2025
Maintainer

Replies: 6 comments 5 replies