diff --git a/docs/callbacks.mdx b/docs/callbacks.mdx deleted file mode 100644 index 85d9f186f07..00000000000 --- a/docs/callbacks.mdx +++ /dev/null @@ -1,318 +0,0 @@ ---- -title: Callbacks -sidebarTitle: Callbacks ---- - -Callbacks enable the retrieval of a model's status when it transitions to either 'complete' or 'error'. This is accomplished by sending a POST request to the specified URL, with the payload structured as follows: - -```json -{ - "name": "my_model", - "version": 1, - "active": true, - "predictor_id": 123, - "project_name": "mindsdb", - "predictor_created_at": "Fri, 09 Sep 2023 11:50:27 GMT", - "old_status": "training", - "new_status": "complete", - "changed_at": "Fri, 09 Sep 2023 11:59:00 GMT" -} -``` - -Let's break down each key: - - `name`: name of the model - - `version`: version of the trained model - - `active`: indicates whether the newly trained model is active - - `predictor_id`: unique identifier for the trained model - - `project_name`: the name of the project in which the model was created - - `predictor_created_at`: time at which the model was created - - `old_status`: status the model had prior to receiving the `new_status` - - `new_status`: current status of the model - - `changed_at`: time when the model's status transitioned from `old_status` to `new_status` - - -Please note that this feature works only in MinsdDB Cloud and only for [MindsDB Starter](/setup/cloud-starter) users. - - -## Callbacks API - -Callback API provides information on how to add, get, edit, and delete callbacks. - -### Add a Callback - -This is the Request format: - -```json -POST /cloud/callback/model_status -Content-Type: application/json -{ - // url will be called on model status change - "url": "https://my.endpoint.com/", - "filter": { - "model_name": ".*", - "project_name": ".*" - }, - "attempt": { - "count": 5, - "http_timeout": 10, - "interval": 10 - } -} -``` - -And this is the Response format: - -```json -Status: 200 OK -Content-Type: application/json -{ - // id of created 'callback' - "id": 123456 -} -``` - -Only the `url` key is required in the Request. All others are optional. - - `url` - this represents the address that will be utilized to send a POST request, containing detailed information about the model's status - - `filter` - this is used to restrict the triggering of callbacks: - - `model_name` - python-style regular expression used to filter the names of models that trigger a callback - - `project_name` - python-style regular expression used to filter the names of projects - - `attempt` - this outlines the configuration of the number and frequency of attempts to send callbacks - - `count` - maximum number of attempts - - `interval` - represents the interval between failed attempts - - `http_timeout` - time allotted to wait for a successful response. - -### Get a Callback - -This is the Request format: - -```json -GET /cloud/callback/model_status -``` - -And this is the Response format: - -```json -Status: 200 OK -Content-Type: application/json -[{ - "id": 123456, - "created_at": "Fri, 09 Sep 2023 11:50:27 GMT", - "url": "https://my.endpoint.com/" -}] -``` - -### Edit a Callback - -This is the Request format: - -```json -PUT /cloud/callback/model_status/ -Content-Type: application/json -{ - // new callback url - "url": "https://my.endpoint.com/" -} -``` - -And this is the Response format: - -```json -Status: 200 OK -``` - -### Delete a Callback - -This is the Request format: - -```json -DELETE /cloud/callback/model_status/ -``` - -And this is the Response format: - -```json -Status: 200 OK -``` - -## Handling Callbacks using Python SDK - -Below is an example of using callbacks with the `home_rentals` model. Be sure that your endpoint `HOSTNAME` is accessible from the internet. - - -Please note that `localhost` is not accessible from the internet - you can make `localhost` accessible via multiple ways, like [ngrok tunnel](https://ngrok.com/). - - -```python -import requests -import mindsdb_sdk -from flask import Flask, request - -MODEL_NAME = 'home_rentals' -HOSTNAME = 'my.endpoint.com' -PORT = 5000 - -app = Flask(__name__) - -con = mindsdb_sdk.connect( - 'https://cloud.mindsdb.com', - login='name@email.com', - password='password' -) - -# add callback -con.api.session.post( - 'https://cloud.mindsdb.com/cloud/callback/model_status', - json={ - 'url': f'https://{HOSTNAME}:{PORT}/' - } -) - -@app.route('/', methods=['POST']) -def callback(): - data = request.json - if data['version'] == 1: - # let retrain the model - model = con.models.get(MODEL_NAME) - model.retrain() - elif data['version'] == 2: - # let make a prediciton - model = con.models.get(MODEL_NAME) - prediction = model.predict({"sqft": 1000}) - print(prediction) - return '', 200 - -# connect to database -db = con.databases.create( - 'example_db', - engine='postgres', - connection_args={ - "user": "demo_user", - "password": "demo_password", - "host": "samples.mindsdb.com", - "port": "5432", - "database": "demo" - } -) - -# train base model -model = con.models.create( - MODEL_NAME, - predict='rental_price', - query=db.tables.get('demo_data.home_rentals') -) - -if __name__ == '__main__': - app.run(host='0.0.0.0', port=PORT) -``` - -## Handling Callbacks using JavaScript SDK - -Below is an example of using callbacks with the `home_rentals` model. Be sure that the callback `HOSTNAME` is accessible from MindsDB Cloud. - -```js -import express from 'express'; -import axios from 'axios'; -import * as MindsDB from 'mindsdb-js-sdk'; -const mdb = MindsDB.default.default; - -const app = express(); -const PORT = 54321; -const MODEL_NAME = 'home_rentals'; -const HOSTNAME = 'my.endpoint.com'; - -// Create an Axios instance with interceptors for timeout and error handling -const customAxios = axios.create(); -customAxios.interceptors.request.use(config => { - config.timeout = 120000; // Set a request timeout of 120 seconds - return config; -}); -customAxios.interceptors.response.use( - response => response, - error => { - console.error('Axios Error:', error.message); - return Promise.reject(error); - } -); - -// Connect to MindsDB Cloud -try { - await mdb.connect({ - host: 'https://cloud.mindsdb.com', - user: 'name@email.com', - password: 'password', - httpClient: customAxios, - }); -} catch (error) { - console.error('MindsDB Cloud Connection Error:', error); - process.exit(1); -} - -// Connect to the database -await mdb.Databases.createDatabase('example_db', 'postgres', { - user: 'demo_user', - password: 'demo_password', - host: 'samples.mindsdb.com', - port: '5432', - database: 'demo', -}); - -// Express Middleware to parse JSON requests -app.use(express.json()); - -// Define an Express route to handle model status updates -app.post('/model-status', async (req, res) => { - const data = req.body; - - if (data.new_status !== 'complete') { - console.error(`Error! Got model status: ${data.new_status}`); - return res.status(400).send({ error: 'Invalid model status' }); - } - - const trained_model_version = data.version; - console.log(`Model training completed, model version=${trained_model_version}`); - - if (trained_model_version === 1) { - // Base model finished training, let's retrain it - let model = await mdb.Models.getModel(MODEL_NAME, 'mindsdb'); - model.retrain(); - } else if (trained_model_version === 2) { - // Model retraining finished, let's make a prediction - let model = await mdb.Models.getModel(MODEL_NAME, 'mindsdb'); - let prediction = await model.query({ - where: [ - 'sqft = 823', - 'location = "good"', - 'neighborhood = "downtown"', - 'days_on_market = 10', - ], - }); - console.log(`Prediction: ${JSON.stringify(prediction)}`); - } - - res.sendStatus(200); -}); - -// Start the Express server -app.listen(PORT, () => { - console.log(`Express server started on port ${PORT}`); -}); - -// Add a callback to notify MindsDB Cloud about the endpoint -try { - await customAxios.post('https://cloud.mindsdb.com/cloud/callback/model_status', { - url: `https://${HOSTNAME}:${PORT}/model-status`, - }); -} catch (error) { - console.error('Callback Error:', error); -} - -// Train a model -try { - await mdb.Models.trainModel(MODEL_NAME, 'rental_price', 'mindsdb', { - integration: 'example_db', - select: 'SELECT * FROM demo_data.home_rentals', - }); -} catch (error) { - console.error('Model Training Error:', error); -} -``` diff --git a/docs/contribute/ml-handlers.mdx b/docs/contribute/ml-handlers.mdx deleted file mode 100644 index e9d9013597c..00000000000 --- a/docs/contribute/ml-handlers.mdx +++ /dev/null @@ -1,258 +0,0 @@ ---- -title: Build an AI/ML Handler -sidebarTitle: Build an AI/ML Handler -icon: "gear" ---- - -In this section, you'll find how to create new machine learning (ML) handlers within MindsDB. - - - **Prerequisite** - - You should have the latest version of the MindsDB repository installed locally. Follow [this guide](/contribute/install/) to learn how to install MindsDB for development. - - -## What are Machine Learning Handlers? - -ML handlers act as a bridge to any ML framework. You use ML handlers to create ML engines using [the CREATE ML_ENGINE command](/sql/create/ml-engine/). So you can expose ML models from any supported ML engine as an AI table. - - - **Database Handlers** - - To learn more about handlers and how to implement a database handler, visit our [doc page here](/contribute/data-handlers/). - - -## Creating a Machine Learning Handler - -You can create your own ML handler within MindsDB by inheriting from the [BaseMLEngine](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L123) class. - -By providing the implementation for some or all of the methods contained in the `BaseMLEngine` class, you can connect with the machine learning library or framework of your choice. - -### Core Methods - -Apart from the `__init__()` method, there are five methods, of which two must be implemented. We recommend checking actual examples in the codebase to get an idea of what goes into each of these methods, as they can change a bit depending on the nature of the system being integrated. - -Let's review the purpose of each method. - -| Method | Purpose | -|-------------------|--------------------------------------------------------------------------------------| -| `create()` | It creates a model inside the engine registry. | -| `predict()` | It calls a model and returns prediction data. | -| `update()` | Optional. It updates an existing model without resetting its internal structure. | -| `describe()` | Optional. It provides global model insights. | -| `create_engine()` | Optional. It connects with external sources, such as REST API. | - -Authors can opt for adding private methods, new files and folders, or any combination of these to structure all the necessary work that will enable the core methods to work as intended. - - - **Other Common Methods** - - Under the `mindsdb.integrations.libs.utils` library, contributors can find various methods that may be useful while implementing new handlers. - - Also, there is a wrapper class for the `BaseMLEngine` instances called [BaseMLEngineExec](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/libs/ml_exec_base.py#L157). It is automatically deployed to take care of modifying the data responses into something that can be used alongside data handlers. - - -### Implementation - -Here are the methods that must be implemented while inheriting from the [BaseMLEngine](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L123) class: - -* [The create() method](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L151) saves a model inside the engine registry for later usage. - -```py -def create(self, target: str, df: Optional[pd.DataFrame] = None, args: Optional[Dict] = None) -> None: - """ - Saves a model inside the engine registry for later usage. - Normally, an input dataframe is required to train the model. - However, some integrations may merely require registering the model instead of training, in which case `df` can be omitted. - Any other arguments required to register the model can be passed in an `args` dictionary. - """ -``` - -* [The predict() method](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L162) calls a model with an input dataframe and optionally, arguments to modify model's behaviour. This method returns a dataframe with the predicted values. - -```py -def predict(self, df: pd.DataFrame, args: Optional[Dict] = None) -> pd.DataFrame: - """ - Calls a model with some input dataframe `df`, and optionally some arguments `args` that may modify the model behavior. - The expected output is a dataframe with the predicted values in the target-named column. - Additional columns can be present, and will be considered row-wise explanations if their names finish with `_explain`. - """ -``` - -And here are the optional methods that you can implement alongside the mandatory ones if your ML framework allows it: - -* [The update() method](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L171) is used to update, fine-tune, or adjust an existing model without resetting its internal state. - -```py -def finetune(self, df: Optional[pd.DataFrame] = None, args: Optional[Dict] = None) -> None: - """ - Optional. - Used to update/fine-tune/adjust a pre-existing model without resetting its internal state (e.g. weights). - Availability will depend on underlying integration support, as not all ML models can be partially updated. - """ -``` - -* [The describe() method](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L181) provides global model insights, such as framework-level parameters used in training. - -```py -def describe(self, key: Optional[str] = None) -> pd.DataFrame: - """ - Optional. - When called, this method provides global model insights, e.g. framework-level parameters used in training. - """ -``` - -* [The create_engine() method](https://github.com/mindsdb/mindsdb/blob/3d9090acb0b8b3b0e2a96e2c93dad436f5ebef90/mindsdb/integrations/libs/base.py#L189) is used to connect with the external sources, such as REST API. - -```py -def create_engine(self, connection_args: dict): - """ - Optional. - Used to connect with external sources (e.g. a REST API) that the engine will require to use any other methods. - """ -``` - -## MindsDB ML Ecosystem - -MindsDB has recently decoupled some modules out of its AutoML package in order to leverage them in integrations with other ML engines. The three modules are as follows: - -1. The [type_infer](https://github.com/mindsdb/type_infer) module that implements automated type inference for any dataset.

-Below is the description of the input and output of this module.

-**Input:** tabular dataset.

-**Output:** best guesses of what type of data each column contains. - -2. The [dataprep_ml](https://github.com/mindsdb/dataprep_ml) module that provides data preparation utilities, such as data cleaning, analysis, and splitting. Data cleaning procedures include column-wise cleaners, column-wise missing value imputers, and data splitters (train-val-test split, either simple or stratified).

-Below is the description of the input and output of this module.

-**Input:** tabular dataset.

-**Output:** cleaned dataset, plus insights useful for data analysis and model building. - -3. The [mindsdb_evaluator](https://github.com/mindsdb/mindsdb_evaluator) module that provides utilities for evaluating the accuracy and calibration of ML models.

-Below is the description of the input and output of this module.

-**Input:** model predictions and the input data used to generate these predictions, including corresponding ground truth values of the column to predict.

-**Output:** accuracy metrics that evaluate prediction accuracy and calibration metrics that check whether model-emitted probabilities are calibrated. - -We recommend that new contributors use [type_infer](https://github.com/mindsdb/type_infer) and [dataprep_ml](https://github.com/mindsdb/dataprep_ml) modules when writing ML handlers to avoid reimplementing thin AutoML layers over and over again; it is advised to focus on mapping input data and user parameters to the underlying framework’s API. - -For now, using the [mindsdb_evaluator](https://github.com/mindsdb/mindsdb_evaluator) module is not required, but will be in the short to medium term, so it’s important to be aware of it while writing a new integration. - - -**Example** - -Let’s say you want to write an integration for `TPOT`. Its high-level API exposes classes that are either for classification or regression. But as a handler designer, you need to ensure that arbitrary ML tasks are dispatched properly to each class (i.e., not using a regressor for a classification problem and vice versa). First, `type_infer` can help you by estimating the data type of the target variable (so you immediately know what class to use). Additionally, to quickly get a stratified train-test split, you can leverage `dataprep_ml` splitters and continue to focus on the actual usage of TPOT for the training and inference logic. - - - -We would appreciate your feedback regarding usage & feature roadmap for the above modules, as they are quite new. - - -## Step-by-Step Instructions - - - - - - 1. Set up MindsDB using the [self-hosted pip](/setup/self-hosted/pip/source) installation method. - 2. Make sure you can run the [quickstart example](/quickstart) locally. If you run into errors, check your bash terminal output. - 3. Create a new git branch to store your changes. - - - - - - 1. Check that you can run the existing handler tests with `python -m pytest tests/unit/ml_handlers/`. If you get the `ModuleNotFoundError` error, try adding the `__init__.py` file to any subdirectory that doesn't have it. - - 2. Copy the simple tests from a relevant handler. - - 3. Change the SQL query to reference your handler. Specifically, set `USING engine={HandlerName}`. - - 4. Run your new test. Please note that it should fail as you haven’t yet added your handler. The exception should be `Can't find integration_record for handler ...`. - - - - - - 1. Create a new directory in `mindsdb/integrations/handlers/`. You must name the new directory `{HandlerName}_handler/`. - - 2. Copy the `.py` files from the [OpenAI handler folder](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/openai_handler), including: `__about__.py`, `__init__.py`, `openai_handler.py`, `creation_args.py`, and `model_using_args.py`. - - - Note that the arguments used at model creation time (stored in `creation_args.py`) and the arguments used at prediction time (stored in `model_using_args.py`) should be stored in separate files in order to be able to hide sensitive information such as API keys. - - By default, when querying for `connection_data` from the `information_schema.ml_engines` table or `training_options` from the `information_schema.models` table, all sensitive information is hidden. To unhide it, use this command: - - ```sql - set show_secrets=true; - ``` - - - 3. Change the contents of `.py` files to match your new handler. Also, change the name of the `statsforecast_handler.py` file to match your handler. - - 4. Modify the `requirements.txt` file to install your handler’s dependencies. You may get conflicts with other packages, but you can ignore them for now. - - 5. Create a new blank class for your handler in the `{HandlerName}_handler.py` file. Like for other handlers, this should be a subclass of the `BaseMLEngine` class. - - 6. Add your new handler class to the testing DB. In the `tests/unit/executor_test_base.py` file starting at line 91, you can see how other handlers are added with `db.session.add(...)`. Copy that and modify it to add your handler. - - 7. Run your new test. Please note that it should still fail but with a different exception message. - - - - - - 1. Define a `create()` method that deals with the model setup arguments. This will add your handler to the models table. Depending on the framework, you may also train the model here using the `df` argument. - - 2. Save relevant arguments/trained models at the end of your `create` method. This allows them to be accessed later. Use the `engine_storage` attributes; you can find examples in other handlers' folders. - - 3. Define a `predict()` method that makes model predictions. This method must return a dataframe with format matching the input, except with a column containing your model’s predictions of the target. The input df is a subset of the original df with the rows determined by the conditions in the predict SQL query. - - 4. Don’t debug the `create()` and `predict()` methods with the `print()` statement because they’re inside a subthread. Instead, write relevant info to disk. - - 5. Once your first test passes, add new tests for any important cases. You can also add tests for any helper functions you write. - - - - - - 1. Launch the MindsDB server locally with `python -m mindsdb`. Again, any issues will appear in the terminal output. - - 2. Check that your handler has been added to the local server database. You can view the list of handlers with `SELECT * from information_schema.handlers`. - - 3. Run the relevant tutorial from the panel on the right side. For regular data, this is `Predict Home Rental Prices`. And for time series data, this is `Forecast Quarterly House Sales`. Specify `USING ENGINE={your_handler}` while creating a model. - - 4. Don’t debug the `create()` and `predict()` methods with the `print()` statement because they’re inside a subthread. Instead, write relevant info to disk. - - 5. You should get sensible results if your handler has been well-implemented. Make sure you try the predict step with a range of parameters. - - - - - - 1. You need to fork the MindsDB repository. Follow [this guide](https://github.com/mindsdb/mindsdb/blob/main/CONTRIBUTING.md) to start a PR. - - 2. If relevant, add your tests and new dependencies to the CI config. This is at `.github/workflows/mindsdb.yml`. - - - - - - -Please note that `pytest` is the recommended testing package. Use `pytest` to confirm your ML handler implementation is correct. - - - -**Templates for Unit Tests** - -If you implement a time-series ML handler, create your unit tests following the structure of the [StatsForecast unit tests](https://github.com/mindsdb/mindsdb/blob/main/tests/unit/ml_handlers/test_statsforecast.py). - -If you implement an NLP ML handler, create your unit tests following the structure of the [Hugging Face unit tests](https://github.com/mindsdb/mindsdb/blob/main/tests/unit/ml_handlers/test_huggingface.py). - - - -## Check out our Machine Learning Handlers! - -To see some ML handlers that are currently in use, we encourage you to check out the following ML handlers inside the MindsDB repository: - -* [HuggingFace](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/huggingface_handler) -* [OpenAI](https://github.com/mindsdb/mindsdb/blob/main/mindsdb/integrations/handlers/openai_handler) - -And here are [all the handlers available in the MindsDB repository](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers). diff --git a/docs/data_catalog/integrations/overview.mdx b/docs/data_catalog/integrations/overview.mdx deleted file mode 100644 index 52099e4fe2b..00000000000 --- a/docs/data_catalog/integrations/overview.mdx +++ /dev/null @@ -1,72 +0,0 @@ ---- -title: Data Catalog for Integrations -sidebarTitle: Overview ---- - -As of now, the Data Catalog is available for the following integrations: - -- [Snowflake](/integrations/data-integrations/snowflake) -- [Salesforce](/integrations/app-integrations/salesforce) -- [BigQuery](/integrations/data-integrations/google-bigquery) -- [MS SQL Server](/integrations/data-integrations/microsoft-sql-server) -- [MySQL](/integrations/app-integrations/mysql) -- [Oracle](/integrations/data-integrations/oracle) -- [PostgreSQL](/integrations/data-integrations/postgresql) - -### Enabling the Data Catalog - -To enable the Data Catalog feature in MindsDB, update your `config.json` file by setting the `data_catalog` flag to `true`: - -```json -{ - "data_catalog": { - "enabled": true - } -} -``` - -Follow this doc page to learn how to [start MindsDB with custom configuration](/setup/custom-config). - - -Note that the data catalog is generated for a data source only after this data source is connected to an agent. - -Here is an example: - -```sql -CREATE DATABASE snowflake_data -WITH - ENGINE = 'snowflake', - PARAMETERS = { - "account": "abc123-xyz987", - "user": "username", - "password": "password", - "database": "database_name", - "schema": "schema_name", - "warehouse": "warehouse_name" - }; - -CREATE AGENT my_agent -USING - include_tables= ['snowflake_data.table_name', ...]; -``` - -Now you can [query the data catalog](/data_catalog/integrations/query) generated for the `snowflake_data` integration. - - -### How It Works - -When you create an [agent](/mindsdb_sql/agents/agent) in MindsDB that connects to one of the supported integrations, the Data Catalog automatically: - -1. Inspects the data source. -2. Extracts metadata for all accessible tables and columns. -3. Stores this information in a dedicated catalog schema (`DATA_CATALOG`). -4. Makes this metadata available to agents and users via both SQL queries and internal reasoning. - - -**Current Limitations** - -This feature is still evolving and has some known limitations: - -- **One-Time Snapshot**: Metadata is generated only once—at the time the agent is created. If the data schema changes (e.g., new columns, renamed tables), the Data Catalog will not automatically update. A refresh mechanism is planned in a future release. -- **No Manual Feedback**: If any metadata appears to be incorrect (e.g., wrong row counts or data types), there is currently no way for users to flag or correct it. A feedback system will be introduced soon. - diff --git a/docs/data_catalog/integrations/query.mdx b/docs/data_catalog/integrations/query.mdx deleted file mode 100644 index 627806ff145..00000000000 --- a/docs/data_catalog/integrations/query.mdx +++ /dev/null @@ -1,121 +0,0 @@ ---- -title: Querying Data Catalog for Integrations -sidebarTitle: Querying Data Catalog ---- - -MindsDB exposes collected metadata from connected data sources via virtual tables in the `INFORMATION_SCHEMA` schema. These views allow users to inspect and query the Data Catalog using familiar SQL syntax. - -## Available Data Catalog Tables - -To filter results for a specific data integration, use `WHERE TABLE_SCHEMA = ''`. - -### `INFORMATION_SCHEMA.META_TABLES` - -Provides high-level metadata about available tables in a given integration. - -Here are the available columns: - -- `TABLE_NAME` (string): Name of the table. -- `TABLE_TYPE` (string, optional): Type of table (e.g., `BASE TABLE`, `VIEW`). -- `TABLE_SCHEMA` (string, optional): Schema name or integration name. -- `TABLE_DESCRIPTION` (string, optional): Description of the table. -- `ROW_COUNT` (integer, optional): Estimated row count. - -Here is how to query it foe a specific data integration: - -```sql -SELECT * FROM INFORMATION_SCHEMA.META_TABLES -WHERE TABLE_SCHEMA = 'integration_name'; -``` - -### `INFORMATION_SCHEMA.META_COLUMNS` - -Returns detailed column-level metadata for all tables in the specified integration. - -Here are the available columns: - -- `TABLE_NAME` (string): Name of the table. -- `COLUMN_NAME` (string): Column name. -- `DATA_TYPE` (string): Data type of the column. -- `COLUMN_DESCRIPTION` (string, optional): Description of the column. -- `IS_NULLABLE` (boolean, optional): Whether nulls are allowed. -- `COLUMN_DEFAULT` (string, optional): Default value, if any. - -Here is how to query it foe a specific data integration: - -```sql -SELECT * FROM INFORMATION_SCHEMA.META_COLUMNS -WHERE TABLE_SCHEMA = 'integration_name'; -``` - -### `INFORMATION_SCHEMA.META_COLUMN_STATISTICS` - -Provides statistical insights about each column’s values and distribution. - -Here are the available columns: - -- `TABLE_NAME` (string): Name of the table. -- `COLUMN_NAME` (string): Column name. -- `MOST_COMMON_VALUES` (array of strings, optional) -- `MOST_COMMON_FREQUENCIES` (array of integers, optional) -- `NULL_PERCENTAGE` (float, optional) -- `MINIMUM_VALUE` (string, optional) -- `MAXIMUM_VALUE` (string, optional) -- `DISTINCT_VALUES_COUNT` (integer, optional) - -Here is how to query it foe a specific data integration: - -```sql -SELECT * FROM INFORMATION_SCHEMA.META_COLUMN_STATISTICS -WHERE TABLE_SCHEMA = 'integration_name'; -``` - -### `INFORMATION_SCHEMA.META_KEY_COLUMN_USAGE` - -Describes the primary key columns for tables in the integration. - -Here are the available columns: - -- `TABLE_NAME` (string): Name of the table. -- `COLUMN_NAME` (string): Column name. -- `ORDINAL_POSITION` (integer, optional) -- `CONSTRAINT_NAME` (string, optional) - -Here is how to query it foe a specific data integration: - -```sql -SELECT * FROM INFORMATION_SCHEMA.META_KEY_COLUMN_USAGE -WHERE TABLE_SCHEMA = 'integration_name'; -``` - -### `INFORMATION_SCHEMA.META_TABLE_CONSTRAINTS` - -Lists table-level constraints, including primary and foreign keys. - -Here are the available columns: - -- `TABLE_NAME` (string): Name of the table. -- `CONSTRAINT_NAME` (string, optional) -- `CONSTRAINT_TYPE` (string): e.g., PRIMARY KEY, FOREIGN KEY - -Here is how to query it foe a specific data integration: - -```sql -SELECT * FROM INFORMATION_SCHEMA.META_TABLE_CONSTRAINTS -WHERE TABLE_SCHEMA = 'integration_name'; -``` - -### `INFORMATION_SCHEMA.META_HANDLER_INFO` - -Returns a textual summary of the integration implementation, including supported SQL features and capabilities. - -Here are the available columns: - -- `HANDLER_INFO` (string): Description. - -Here is how to query it foe a specific data integration: - -```sql -SELECT * FROM INFORMATION_SCHEMA.META_HANDLER_INFO -WHERE TABLE_SCHEMA = 'integration_name'; -``` diff --git a/docs/data_catalog/overview.mdx b/docs/data_catalog/overview.mdx deleted file mode 100644 index 1a6b351e896..00000000000 --- a/docs/data_catalog/overview.mdx +++ /dev/null @@ -1,16 +0,0 @@ ---- -title: Data Catalog -sidebarTitle: Overview ---- - -The **Data Catalog** in MindsDB plays a key role in enhancing the context available to [agents](/mindsdb_sql/agents/agent) when querying data sources. By automatically indexing and storing metadata, such as table names, column types, constraints, and statistics, the catalog empowers agents to understand the structure and semantics of the data, leading to more accurate and efficient query generation. - -### Why It Matters - -When agents interpret natural language questions or generate SQL queries, access to metadata improves their ability to: - -- Understand relationships between tables and fields. -- Infer joins, filters, and aggregations more intelligently. -- Avoid syntax errors due to missing or unknown schema information. - -This metadata layer provides agents with the necessary context to avoid making uninformed queries. diff --git a/docs/docs.json b/docs/docs.json index ff5a1352eb2..8e82e94f7e3 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -100,6 +100,7 @@ "integrations/app-integrations/plaid", "integrations/app-integrations/pypi", "integrations/app-integrations/reddit", + "integrations/app-integrations/rest-api", "integrations/app-integrations/salesforce", "integrations/app-integrations/sendinblue", "integrations/app-integrations/shopify", @@ -153,6 +154,7 @@ "integrations/data-integrations/microsoft-access", "integrations/data-integrations/microsoft-sql-server", "integrations/data-integrations/monetdb", + "integrations/data-integrations/mongodb", "integrations/data-integrations/mysql", "integrations/data-integrations/oceanbase", "integrations/data-integrations/opengauss", @@ -233,19 +235,6 @@ } ] }, - { - "group": "Data Catalog", - "pages": [ - "data_catalog/overview", - { - "group": "Data Catalog for Integrations", - "pages": [ - "data_catalog/integrations/overview", - "data_catalog/integrations/query" - ] - } - ] - }, { "group": "Bring Your Own Models", "pages": [ @@ -1061,26 +1050,10 @@ "source": "/data-integrations/yugabytedb", "destination": "/integrations/data-integrations/yugabytedb" }, - { - "source": "/custom-model/openai", - "destination": "/integrations/ai-engines/openai" - }, - { - "source": "/custom-model/huggingface", - "destination": "/integrations/ai-engines/huggingface" - }, - { - "source": "/custom-model/langchain", - "destination": "/integrations/ai-engines/langchain" - }, { "source": "/custom-model/byom", "destination": "/integrations/ai-engines/byom" }, - { - "source": "/custom-model/ray-serve", - "destination": "/integrations/ai-engines/ray-serve" - }, { "source": "/custom-model/mlflow", "destination": "/integrations/ai-engines/mlflow" @@ -1305,122 +1278,6 @@ "source": "/agents/knowledge-bases", "destination": "/mindsdb_sql/agents/knowledge-bases" }, - { - "source": "/finetune/openai", - "destination": "/use-cases/automated_finetuning/openai" - }, - { - "source": "/finetune/classification", - "destination": "/use-cases/automated_finetuning/classification" - }, - { - "source": "/finetune/regression", - "destination": "/use-cases/automated_finetuning/regression" - }, - { - "source": "/sql/tutorials/chatbots_agents", - "destination": "/use-cases/ai_agents/chatbots_agents" - }, - { - "source": "/sql/tutorials/create-chatbot", - "destination": "/use-cases/ai_agents/create-chatbot" - }, - { - "source": "/sql/tutorials/create-chatbot-kb", - "destination": "/use-cases/ai_agents/create-chatbot-kb" - }, - { - "source": "/sql/tutorials/recommenders", - "destination": "/use-cases/ai-powered_data_retrieval/recommenders" - }, - { - "source": "/sql/tutorials/image-generator", - "destination": "/use-cases/data_enrichment/image-generator" - }, - { - "source": "/nlp/json-from-text", - "destination": "/use-cases/data_enrichment/json-from-text" - }, - { - "source": "/nlp/text-summarization-inside-mysql-with-openai", - "destination": "/use-cases/data_enrichment/text-summarization-inside-mysql-with-openai" - }, - { - "source": "/nlp/sentiment-analysis-inside-mysql-with-openai", - "destination": "/use-cases/data_enrichment/sentiment-analysis-inside-mysql-with-openai" - }, - { - "source": "/nlp/question-answering-inside-mysql-with-openai", - "destination": "/use-cases/data_enrichment/question-answering-inside-mysql-with-openai" - }, - { - "source": "/sql/tutorials/text-sentiment-hf", - "destination": "/use-cases/data_enrichment/text-sentiment-hf" - }, - { - "source": "/sql/tutorials/hugging-face-examples", - "destination": "/use-cases/data_enrichment/hugging-face-examples" - }, - { - "source": "/sql/tutorials/hugging-face-inference-api-examples", - "destination": "/use-cases/data_enrichment/hugging-face-inference-api-examples" - }, - { - "source": "/sql/tutorials/house-sales-forecasting", - "destination": "/use-cases/predictive_analytics/house-sales-forecasting" - }, - { - "source": "/sql/tutorials/eeg-forecasting", - "destination": "/use-cases/predictive_analytics/eeg-forecasting" - }, - { - "source": "/sql/tutorials/customer-churn", - "destination": "/use-cases/in-database_ml/customer-churn" - }, - { - "source": "/sql/tutorials/home-rentals", - "destination": "/use-cases/in-database_ml/home-rentals" - }, - { - "source": "/sql/tutorials/slack-chatbot", - "destination": "/use-cases/ai_workflow_automation/slack-chatbot" - }, - { - "source": "/sql/tutorials/twitter-chatbot", - "destination": "/use-cases/ai_workflow_automation/twitter-chatbot" - }, - { - "source": "/sql/tutorials/twilio-chatbot", - "destination": "/use-cases/ai_workflow_automation/twilio-chatbot" - }, - { - "source": "/sql/tutorials/customer-reviews-notifications", - "destination": "/use-cases/ai_workflow_automation/customer-reviews-notifications" - }, - { - "source": "/sql/tutorials/real-time-trading-forecasts", - "destination": "/use-cases/ai_workflow_automation/real-time-trading-forecasts" - }, - { - "source": "/model-management", - "destination": "/features/model-management" - }, - { - "source": "/ai-integrations", - "destination": "/features/ai-integrations" - }, - { - "source": "/data-integrations", - "destination": "/features/data-integrations" - }, - { - "source": "/automation", - "destination": "/features/automation" - }, - { - "source": "/features/ai-integrations", - "destination": "/integrations/ai-overview" - }, { "source": "/features/data-integrations", "destination": "/integrations/data-overview" diff --git a/docs/features/ai-integrations.mdx b/docs/features/ai-integrations.mdx deleted file mode 100644 index c04fb3911fc..00000000000 --- a/docs/features/ai-integrations.mdx +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: AI Integrations -sidebarTitle: AI Integrations -icon: "microchip-ai" ---- - - diff --git a/docs/features/automation.mdx b/docs/features/automation.mdx deleted file mode 100644 index 94b4df3b944..00000000000 --- a/docs/features/automation.mdx +++ /dev/null @@ -1,19 +0,0 @@ ---- -title: Automation -sidebarTitle: Automation -icon: "bolt-auto" ---- - -MindsDB provides mechanisms to automate tasks. These include [jobs](/mindsdb_sql/sql/create/jobs), [triggers](/mindsdb_sql/sql/create/trigger), and [chatbots](/mindsdb_sql/agents/chatbot). - -

- -

- - -Use AI automation to keep your AI systems up-to-date by continuously retraining or fine-tuning them with real-time data. Follow [this use case](/sql/tutorials/real-time-trading-forecasts) to learn how to build it with MindsDB. - - - -Use AI automation to create chatbots. Follow [this use case](/sql/tutorials/twilio-chatbot) to learn how to build custom chatbots with MindsDB. - diff --git a/docs/features/data-integrations.mdx b/docs/features/data-integrations.mdx deleted file mode 100644 index 3930e98fa0c..00000000000 --- a/docs/features/data-integrations.mdx +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: Data Integrations -sidebarTitle: Data Integrations -icon: "database" ---- - - diff --git a/docs/features/model-management.mdx b/docs/features/model-management.mdx deleted file mode 100644 index ddbd428c9b3..00000000000 --- a/docs/features/model-management.mdx +++ /dev/null @@ -1,28 +0,0 @@ ---- -title: Model Management -sidebarTitle: Model Management -icon: "bars-progress" ---- - -MindsDB abstracts AI models, making them accessible from enterprise data environments. - -

- -

- -MindsDB enables you to manage every aspect of AI models. With MindsDB, you can [CREATE MODEL](/mindsdb_sql/sql/create/model), [FINETUNE](/mindsdb_sql/sql/api/finetune), [RETRAIN](/mindsdb_sql/sql/api/retrain), and more. - -* [Deploy](/mindsdb_sql/sql/create/model) -

You can [create, train, and deploy AI models](/mindsdb_sql/sql/create/model) based on popular [AI/ML frameworks](/integrations/ai-overview) within MindsDB. - -* [Fine-tune](/mindsdb_sql/sql/api/finetune) -

You can [fine-tune models](/mindsdb_sql/sql/api/finetune) with data from various [data sources](/integrations/data-overview) connected to MindsDB. Check out [examples here](/use-cases/automated_finetuning/overview). - -* [Automate](/mindsdb_sql/sql/create/jobs) -

You can automate tasks, including retraining or fine-tuning of AI models, to keep your AI system up-to-date. See [examples here](/use-cases/ai_workflow_automation/overview). - - -Go ahead and create an AI model! - -Use [SQL API](/mindsdb_sql/overview), [REST API](/rest/overview), or one of the [SDKs](/sdks/overview) to create and deploy AI models within MindsDB. - diff --git a/docs/generative-ai-tables.mdx b/docs/generative-ai-tables.mdx deleted file mode 100644 index 27a903c2eff..00000000000 --- a/docs/generative-ai-tables.mdx +++ /dev/null @@ -1,267 +0,0 @@ ---- -title: Generative AI Tables -sidebarTitle: AI Tables ---- - -MindsDB empowers organizations to harness the power of AI by abstracting AI models as Generative AI Tables. These tables are capable of learning from the input data and generating predictions from the underlying model upon being queried. This abstraction makes AI highly accessible, enabling development teams to use their existing SQL skills to build applications powered by AI. - - -MindsDB integrates with numerous AI frameworks. [Learn more here](/integrations/ai-overview). - - -

- -

- -## What are Generative AI Tables? - -Generative AI is a subfield of artificial intelligence that trains AI models to create new content, such as realistic text, forecasts, images, and more, by learning patterns from existing data. - -MindsDB revolutionizes machine learning within enterprise databases by introducing the concept of **Generative AI tables**. These essentially abstract AI models as virtual AI tables, capable of producing output when given certain input. - -## How to Use Generative AI Tables - -AI tables, introduced by MindsDB, abstract AI models as virtual tables so you can simply query AI models for predictions. - -With MindsDB, you can join multiple AI tables (that abstract AI models) with multiple data tables (that provide input to the models) to get all predictions at once. - -Let's look at some examples. - -### Deploy AI Models as AI Tables - -You can deploy an AI model as a virtual AI table using the `CREATE MODEL` statement. - -Here we create a model that classifies sentiment of customer reviews as instructed in the prompt template message. The required input is the review and output is the sentiment predicted by the model. - -```sql -CREATE MODEL sentiment_classifier_model -PREDICT sentiment -USING - engine = 'openai_engine', - model_name = 'gpt-4', - prompt_template = 'describe the sentiment of the reviews - strictly as "positive", "neutral", or "negative". - "I love the product":positive - "It is a scam":negative - "{{review}}.":'; -``` - -Next we create a model that generates responses to the reviews. The required input includes review, product name, and sold product quantity, and output is the response generated by the model. - -```sql -CREATE MODEL response_generator_model -PREDICT response -USING - engine = 'openai_engine', - model_name = 'gpt-4', - prompt_template = 'briefly respond to the customer review: {{review}}, added by a customer after buying {{product_name}} in quantity {{quantity}}'; -``` - - -Follow [this doc page](/integrations/ai-engines/openai) to configure the OpenAI engine in MindsDB. - - -Now let's look at the data tables that we'll use to provide input data to the AI tables. - -### Prepare Input Data - -The `amazon_reviews` table stores the following columns: - -```sql -+----------------------------+-----------------------------+------------------------+-------------+ -| created_at | product_name | review | customer_id | -+----------------------------+-----------------------------+------------------------+-------------+ -| 2023-10-03 16:30:00.000000 | Power Adapter | It is a great product. | 1 | -| 2023-10-03 16:31:00.000000 | Bluetooth and Wi-Fi Speaker | It is ok. | 2 | -| 2023-10-03 16:32:00.000000 | Kindle eReader | It doesn’t work. | 3 | -+----------------------------+-----------------------------+------------------------+-------------+ -``` - -It provides sufficient input data for the `sentiment_classifier_model`, but not for the `response_generator_model`. - -The `products_sold` table stores the following columns: - -```sql -+----------------------------+-----------------------------+-------------+----------+ -| sale_date | product_name | customer_id | quantity | -+----------------------------+-----------------------------+-------------+----------+ -| 2023-10-03 16:30:00.000000 | Power Adapter | 1 | 20 | -| 2023-10-03 16:31:00.000000 | Bluetooth and Wi-Fi Speaker | 2 | 5 | -| 2023-10-03 16:32:00.000000 | Kindle eReader | 3 | 10 | -+----------------------------+-----------------------------+-------------+----------+ -``` - -The `response_generator_model` requires the two tables to be joined to provide it with sufficient input data. - -### Make Predictions - -You can query the AI tables directly or join AI tables with data tables to get the predictions. - -There are two ways you can provide input to the models: - -1. If you query the AI table directly, you can provide input data in the `WHERE` clause, like this: - - ```sql - SELECT review, sentiment - FROM sentiment_classifier_model - WHERE review = 'I like it'; - ``` - -2. You can provide input data to AI tables from the joined data tables, like this: - - ```sql - SELECT inp.product_name, - inp.review, - m1.sentiment, - m2.response - FROM data_integration_conn.amazon_reviews2 AS inp - JOIN data_integration_conn.products_sold AS inp2 - ON inp.customer_id = inp2.customer_id - JOIN sentiment_classifier_model AS m1 - JOIN response_generator_model AS m2; - ``` - - The `sentiment_classifier_model` requires a parameter named `review`, so the data table should contain a column named `review`, which is picked up by the model. - - Note that, when joining data tables, you must provide the `ON` clause condition, which is implemented implicitly when joining the AI tables. - -Moreover, you can combine these two options and provide the input data to the AI tables partially from the data tables and partially from the `WHERE` clause, like this: - -```sql -SELECT inp.product_name, - inp.review, - m1.sentiment, - m2.response -FROM data_integration_conn.amazon_reviews2 AS inp -JOIN sentiment_classifier_model AS m1 -JOIN response_generator_model AS m2 -WHERE m2.quantity = 5; -``` - -Here the `sentiment_classifier_model` takes input data from the `amazon_review` table, while the `response_generator_model` takes input data from the `amazon_reviews` table and from the `WHERE` clause. - -Furthermore, you can make use of subqueries to provide input data to the models via the `WHERE` clause, like this: - -```sql -SELECT inp.product_name, - inp.review, - m1.sentiment, - m2.response -FROM data_integration_conn.amazon_reviews2 AS inp -JOIN sentiment_classifier_model AS m1 -JOIN response_generator_model AS m2 -WHERE m2.quantity = (SELECT quantity - FROM data_integration_conn.products_sold - WHERE customer_id = 2); -``` - -## Difference between AI Tables and Standard Tables - -To understand the difference, let's go over a simpler example. Here we will see how traditional database tables are designed to give you a deterministic response given some input, and how Generative AI Tables are designed to generate an approximate response given some input. - -Let’s consider the following `income_table` table that stores the `income` and `debt` values. - -```sql -SELECT income, debt -FROM income_table; -``` - -On execution, we get: - -```sql -+------+-----+ -|income|debt | -+------+-----+ -|60000 |20000| -|80000 |25100| -|100000|30040| -|120000|36010| -+------+-----+ -``` - -A simple visualization of the data present in the `income_table` table is as follows: - -![Income vs Debt](/assets/sql/income_vs_debt.png) - -Querying the income table to get the `debt` value for a particular `income` value results in the following: - -```sql -SELECT income, debt -FROM income_table -WHERE income = 80000; -``` - -On execution, we get: - -```sql -+------+-----+ -|income|debt | -+------+-----+ -|80000 |25100| -+------+-----+ -``` - -And here is what we get: - -![Income vs Debt chart](/assets/sql/income_vs_debt_known_value.png) - -But what happens when querying the table for an `income` value that is not -present there? - -```sql -SELECT income, debt -FROM income_table -WHERE income = 90000; -``` - -On execution, we get: - -```sql -Empty set (0.00 sec) -``` - -When the `WHERE` clause condition is not fulfilled for any of the rows, no value is returned. - -![Income vs Debt query](/assets/sql/income_vs_debt_unknown_value.png) - -When a table doesn’t have an exact match, the query returns an empty set or null value. This is where the AI Tables come into play! - -Let’s create a `debt_model` model that allows us to approximate the `debt` value for any `income` value. We train the `debt_model` model using the data from the `income_table` table. - -```sql -CREATE MODEL mindsdb.debt_model -FROM income_table -PREDICT debt; -``` - -On execution, we get: - -```sql -Query OK, 0 rows affected (x.xxx sec) -``` - -MindsDB provides the [`CREATE MODEL`](/sql/create/model/) statement. On execution of this statement, the predictive model works in the background, automatically creating a vector representation of the data that can be visualized as follows: - -![Income vs Debt model](/assets/sql/income_vs_debt_predictor.png) - -Let’s now look for the `debt` value of some random `income` value. To get the approximated `debt` value, we query the `mindsdb.debt_model` model instead of the `income_table` table. - -```sql -SELECT income, debt -FROM mindsdb.debt_model -WHERE income = 90000; -``` - -On execution, we get: - -```sql -+------+-----+ -|income|debt | -+------+-----+ -|90000 |27820| -+------+-----+ -``` - -And here is how it looks: - -![Income vs Debt model](/assets/sql/income_vs_debt_prediction.png) diff --git a/docs/integrations/ai-engines/amazon-bedrock.mdx b/docs/integrations/ai-engines/amazon-bedrock.mdx deleted file mode 100644 index 81e9205d399..00000000000 --- a/docs/integrations/ai-engines/amazon-bedrock.mdx +++ /dev/null @@ -1,150 +0,0 @@ ---- -title: Amazon Bedrock -sidebarTitle: Amazon Bedrock ---- - -This documentation describes the integration of MindsDB with [Amazon Bedrock](https://aws.amazon.com/bedrock/), a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies. -The integration allows for the deployment of models offered by Amazon Bedrock within MindsDB, providing the models with access to data from various data sources. - -## Prerequisites - -Before proceeding, ensure the following prerequisites are met: - -1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop). -2. To use Amaon Bedrock within MindsDB, install the required dependencies following [this instruction](/setup/self-hosted/docker#install-dependencies). -3. Obtain the AWS credentials for a user with access to the Amazon Bedrock service. - -## Setup - -Create an AI engine from the [Amazon Bedrock handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/bedrock_handler). - -```sql -CREATE ML_ENGINE bedrock_engine -FROM bedrock -USING - aws_access_key_id = 'AQAXEQK89OX07YS34OP', - aws_secret_access_key = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY', - aws_session_token = 'FwoGZXIvYXdzEJr...', - aws_region = 'us-east-1', -``` - -Required parameters for creating an engine include the following: - -- `aws_access_key_id`: The AWS access key ID for the user. -- `aws_secret_access_key`: The AWS secret access key for the user. -- `aws_region`: The AWS region to use. - -Optional parameters include the following: - -- `aws_session_token`: The AWS session token for the user. This is required when using temporary security credentials. - -Create a model using `bedrock_engine` as an engine. - -```sql -CREATE MODEL bedrock_model -PREDICT answer -USING - engine = 'bedrock_engine', - question_column = 'question', - max_tokens = 100, - temperature = 0.3, -``` - -Required parameters for creating a model include the following: - -* `engine`: The name of the engine created via `CREATE ML_ENGINE`. - -Optional parameters include the following: - -* `mode`: The mode to run inference in. The default mode is `default` and the other supported mode is `conversational`. -* `model_id`: The model ID to use for inference. The default model ID is `amazon.titan-text-premier-v1:0` and a list of other supported models can be found https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html. -* `question_column`: The column that stores the user input. -* `context_column`: The column that stores context to the user input. -* `prompt_template`: A template for the prompt with placeholders to be replaced by the user input. -* `max_tokens`: The maximum number of tokens to be generated in the model's responses. -* `temperature`: The likelihood of the model selecting higher-probability options while generating a response. -* `top_p`: The percentage of most-likely candidates that the model considers for the next token. -* `stop`: A list of tokens that the model should stop generating at. - - -For the `default` and `conversational` modes, one of the following need to be provided: - * `prompt_template`. - * `question_column`, and an optional `context_column`. - - -## Usage - -### Default Mode - -In the `default` mode, the model will generate a separate response for each input provided. No context is maintained between the inputs. - -```sql -CREATE MODEL bedrock_default_model -PREDICT answer -USING - engine = 'bedrock_engine', - prompt_template = 'Answer the users input in a helpful way: {{question}}'; -``` - -To generate a response for a single input, the following query can be used: - -```sql -SELECT * -FROM bedrock_default_model -WHERE question = 'What is the capital of Sweden?'; -``` - -The response will look like the following: - -| question | answer | -| ------ | -------- | -| What is the capital of Sweden? | The capital of Sweden is Stockholm. Stockholm is the largest city in Sweden, with a population of over 900,000 people in the city proper and over 2 million in the metropolitan area. It is known for its beautiful architecture, scenic waterways, and rich cultural heritage. The city is built on 14 islands, which are connected by over 50 bridges, and is home to many museums, galleries, and historic landmarks. Some of the most famous attractions in Stockholm include the Vasa Museum, the Stockholm Palace, and the Old Town (Gamla Stan). | - -To generate responses for multiple inputs, the following query can be used: - -```sql -SELECT * -FROM files.unrelated_questions AS d -JOIN bedrock_default_model AS m -``` - -The response will look like the following: - -| question | answer | -| -------- | ------ | -| What is the capital of Sweden? | The capital of Sweden is Stockholm. Stockholm is the most populated city in Sweden with over 975,000 residents. The city is known for its stunning architecture and beautiful waterways. | -| What is the second planet in the solar system? | The second planet from the sun in our solar system is Venus. Venus is often called Earth's "sister planet" because of their similar size, mass, and density. However, the two planets have very different atmospheres and surface conditions. Venus has a thick, toxic atmosphere composed of carbon dioxide, which traps heat and causes the planet to have surface temperatures that can reach up to 471 degrees Celsius (880 degrees Fahrenheit). Venus also has a highly reflective cloud cover that obscures its surface, making it difficult to study. Despite these challenges, Venus has been the subject of numerous scientific missions, including several orbiters and landers that have provided valuable insights into the planet's geology, atmosphere, and climate. | - - -`files.unrelated_questions` is a simple CSV file containing a `question` column (as expected by the above model) that has been uploaded to MindsDB. It is, however, possible to use any other supported data source in the same manner. - - -### Conversational Mode - -In the `conversational` mode, the model will maintain context between the inputs and generate a single response. This response will be placed in the last row of the result set. - -```sql -CREATE MODEL bedrock_conversational_model -PREDICT answer -USING - engine = 'bedrock_engine', - mode = 'conversational', - question_column = 'question'; -``` - -The syntax for generating responses in the `conversational` mode is the same as in the `default` mode. - -However, when generating responses for multiple inputs, the difference between the two modes becomes apparent. As mentioned above, the `conversational` mode maintains context between the inputs and generates a single response, which is placed in the last row of the result set: - -```sql -SELECT * -FROM files.related_questions AS d -JOIN bedrock_default_model AS m -``` - -This is what the response will look like: - -| question | answer | -| -------- | ------ | -| What is the capital of Sweden? | [NULL] | -| What are some cool places to visit there? | The capital of Sweden is Stockholm. It’s a beautiful city, with lots of old buildings and a scenic waterfront. You should definitely visit the Royal Palace, which is the largest palace in Scandinavia. You can also visit the Vasa Museum, which has a famous 17th-century warship that sank in Stockholm harbor. And you should definitely check out the ABBA Museum, which is dedicated to the famous pop group. | \ No newline at end of file diff --git a/docs/integrations/ai-engines/anomaly.mdx b/docs/integrations/ai-engines/anomaly.mdx deleted file mode 100644 index b2f91fff228..00000000000 --- a/docs/integrations/ai-engines/anomaly.mdx +++ /dev/null @@ -1,195 +0,0 @@ ---- -title: Anomaly Detection Handler -sidebarTitle: Anomaly Detection ---- - -The Anomaly Detection handler implements supervised, semi-supervised, and unsupervised anomaly detection algorithms using the pyod, catboost, xgboost, and sklearn libraries. The models were chosen based on the results in the [ADBench benchmark paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/cf93972b116ca5268827d575f2cc226b-Abstract-Datasets_and_Benchmarks.html). - - -**Additional information** - -- If no labelled data, we use an unsupervised learner with the syntax `CREATE ANOMALY DETECTION MODEL ` without specifying the target to predict. MindsDB then adds a column called `outlier` when generating results. - -- If we have labelled data, we use the regular model creation syntax. There is backend logic that chooses between a semi-supervised algorithm (currently XGBOD) vs. a supervised algorithm (currently CatBoost). - -- If multiple models are provided, then we create an ensemble and use majority voting. - -- See the anomaly detection proposal [document](https://docs.google.com/document/d/1Yd7ARZVg_67xlcY-JR2kuO7mak9Ia2YER1Jk0EdpEa0/edit#heading=h.mo4wxsae6t1d) for more information. - - - - -**Context about types of anomaly detection** - -- Supervised: we have inlier/outlier labels, so we can train a classifier the normal way. This is very similar to a standard classification problem. - -- Semi-supervised: we have inlier/outlier labels and perform an unsupervised preprocessing step, and then a supervised classification algorithm. - -- Unsupervised: we don’t have inlier/outlier labels and cannot assume all training data are inliers. These methods construct inlier criteria that will classify some training data as outliers too based on distributional traits. New observations are classified against these criteria. However, it’s not possible to evaluate how well the model detects outliers without labels. - - - - -**Default dispatch logic** - -We propose the following logic to determine type of learning: -- Use supervised learning if labels are available and the dataset contains at least 3000 samples. -- Use semi-supervised learning if labels are available and number of samples in the dataset is less than 3000. -- If the dataset is unlabelled, use unsupervised learning. - -We’ve chosen 3000 based on the results of the NeurIPS AD Benchmark paper (linked above). The authors report that semi-supervised learning outperforms supervised learning when the number of samples used is less than 5% of the size of the training dataset. The average size of the training datasets in their study is 60,000, therefore this 5% corresponds to 3000 samples on average. - - - - -**Reasoning for default models on each type** - -We refer to the NeurIPS AD Benchmark paper (linked above) to make these choices: -- For supervised learning, use CatBoost. It often outperforms classic algorithms. -- For semi-supervised, XGBod is a good default from PyOD. -- There’s no clear winner for unsupervised methods, it depends on the use case. ECOD is a sensible default with a fast runtime. If we’re not concerned about runtime, we can use an ensemble. - - - -## Prerequisites - -Before proceeding, ensure the following prerequisites are met: - -1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop). -2. To use Anomaly Detection handler within MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies). - -## Setup - -Create an AI engine from the [Anomaly Detection handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/anomaly_detection_handler). - -```sql -CREATE ML_ENGINE anomaly_detection_engine -FROM anomaly_detection; -``` - -Create a model using `anomaly_detection_engine` as an engine. - -```sql -CREATE ANOMALY DETECTION MODEL anomaly_detection_model -FROM datasource - (SELECT * FROM data_table) -PREDICT target_column -USING - engine = 'anomaly_detection_engine', -- engine name as created via CREATE ML_ENGINE - ...; -- other parameters shown in usage examples below -``` - -## Usage - -To run example queries, use the data from this [CSV file](https://raw.githubusercontent.com/mindsdb/mindsdb/main/tests/unit/ml_handlers/data/anomaly_detection.csv). - -### Unsupervised detection - -```sql -CREATE ANOMALY DETECTION MODEL mindsdb.unsupervised_ad -FROM files - (SELECT * FROM anomaly_detection) -USING - engine = 'anomaly_detection_engine'; - -DESCRIBE MODEL mindsdb.unsupervised_ad.model; - -SELECT t.class, m.outlier as anomaly -FROM files.anomaly_detection as t -JOIN mindsdb.unsupervised_ad as m; -``` - -### Semi-supervised detection - -```sql -CREATE MODEL mindsdb.semi_supervised_ad -FROM files - (SELECT * FROM anomaly_detection) -PREDICT class -USING - engine = 'anomaly_detection_engine'; - -DESCRIBE MODEL mindsdb.semi_supervised_ad.model; - -SELECT t.carat, t.category, t.class, m.class as anomaly -FROM files.anomaly_detection as t -JOIN mindsdb.semi_supervised_ad as m; -``` - -### Supervised detection - -```sql -CREATE MODEL mindsdb.supervised_ad -FROM files - (SELECT * FROM anomaly_detection) -PREDICT class -USING - engine = 'anomaly_detection_engine', - type = 'supervised'; - -DESCRIBE MODEL mindsdb.supervised_ad.model; - -SELECT t.carat, t.category, t.class, m.class as anomaly -FROM files.anomaly_detection as t -JOIN mindsdb.supervised_ad as m; -``` - -### Specific model - -```sql -CREATE ANOMALY DETECTION MODEL mindsdb.unsupervised_ad_knn -FROM files - (SELECT * FROM anomaly_detection) -USING - engine = 'anomaly_detection_engine', - model_name = 'knn'; - -DESCRIBE MODEL mindsdb.unsupervised_ad_knn.model; - -SELECT t.class, m.outlier as anomaly -FROM files.anomaly_detection as t -JOIN mindsdb.unsupervised_ad_knn as m; -``` - -### Specific anomaly type - -```sql -CREATE ANOMALY DETECTION MODEL mindsdb.unsupervised_ad_local -FROM files - (SELECT * FROM anomaly_detection) -USING - engine = 'anomaly_detection_engine', - anomaly_type = 'local'; - -DESCRIBE MODEL mindsdb.unsupervised_ad_local.model; - -SELECT t.class, m.outlier as anomaly -FROM files.anomaly_detection as t -JOIN mindsdb.unsupervised_ad_local as m; -``` - -### Ensemble - -```sql -CREATE ANOMALY DETECTION MODEL mindsdb.ad_ensemble -FROM files - (SELECT * FROM anomaly_detection) -USING - engine = 'anomaly_detection_engine', - ensemble_models = ['knn','ecod','lof']; - -DESCRIBE MODEL mindsdb.ad_ensemble.model; - -SELECT t.class, m.outlier as anomaly -FROM files.anomaly_detection as t -JOIN mindsdb.ad_ensemble as m; -``` - - - -**Next Steps** - -Watch [demo 1](https://www.loom.com/share/0996e5faa3f7415bacd51a6e8e161d5e?sid=9bacd29a-975b-4a94-b081-de2255b93607) and [demo 2](https://www.loom.com/share/c22335d83cb04ac281e2ef080792f2dd) to see usage examples. - -Go to the [Use Cases](https://docs.mindsdb.com/use-cases/overview) section to see more examples. - diff --git a/docs/integrations/ai-engines/anthropic.mdx b/docs/integrations/ai-engines/anthropic.mdx deleted file mode 100644 index 6e65f85559d..00000000000 --- a/docs/integrations/ai-engines/anthropic.mdx +++ /dev/null @@ -1,107 +0,0 @@ ---- -title: Anthropic -sidebarTitle: Anthropic ---- - -This documentation describes the integration of MindsDB with [Anthropic](https://www.anthropic.com/), an AI research company. -The integration allows for the deployment of Anthropic models within MindsDB, providing the models with access to data from various data sources. - -## Prerequisites - -Before proceeding, ensure the following prerequisites are met: - -1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop). -2. To use Anthropic within MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies). -3. Obtain the Anthropic API key required to deploy and use Anthropic models within MindsDB. Follow the [instructions for obtaining the API key](https://docs.anthropic.com/claude/docs/getting-access-to-claude). - -## Setup - -Create an AI engine from the [Anthropic handler](https://github.com/mindsdb/mindsdb/tree/main/mindsdb/integrations/handlers/anthropic_handler). - -```sql -CREATE ML_ENGINE anthropic_engine -FROM anthropic -USING - anthropic_api_key = 'your-anthropic-api-key'; -``` - -Create a model using `anthropic_engine` as an engine. - -```sql -CREATE MODEL anthropic_model -PREDICT target_column -USING - engine = 'anthropic_engine', -- engine name as created via CREATE ML_ENGINE - column = 'column_name', -- column that stores input/question to the model - max_tokens = , -- max number of tokens to be generated by the model (default is 100) - model = 'model_name'; -- choose one of 'claude-instant-1.2', 'claude-2.1', 'claude-3-opus-20240229', 'claude-3-sonnet-20240229' (default is 'claude-2.1') -``` - - - -The integrations between Anthropic and MindsDB was implemented using [Anthropic Python SDK](https://github.com/anthropics/anthropic-sdk-python). - - -## Usage - -The following usage examples utilize `anthropic_engine` to create a model with the `CREATE MODEL` statement. - -Create and deploy the Anthropic model within MindsDB to ask any question. - -```sql -CREATE MODEL anthropic_model -PREDICT answer -USING - column = 'question', - engine = 'anthropic_engine', - max_tokens = 300, - model = 'claude-2.1'; -- choose one of 'claude-instant-1.2', 'claude-2.1', 'claude-3-opus-20240229', 'claude-3-sonnet-20240229' -``` - -Where: - -| Name | Description | -|-------------------|---------------------------------------------------------------------------| -| `column` | It defines the prompt to the model. | -| `engine` | It defines the Anthropic engine. | -| `max_tokens` | It defines the maximum number of tokens to generate before stopping. | -| `model` | It defines model that will complete your prompt. | - - - -**Default Model** - -When you create an Anthropic model in MindsDB, it uses the `claude-2.1` model by default. But you can use other available models by passing the model name to the `model` parameter in the `USING` clause of the `CREATE MODEL` statement. - - - - -**Default Max Tokens** - -When you create an Anthropic model in MindsDB, it uses 100 tokens as the maximum by default. But you can adjust this value by passing it to the `max_tokens` parameter in the `USING` clause of the `CREATE MODEL` statement. - - -Query the model to get predictions. - -```sql -SELECT question, answer -FROM anthropic_model -WHERE question = 'Where is Stockholm located?'; -``` - -Here is the output: - -```sql -+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ -| question | answer | -+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ -| Where is Stockholm located? | Stockholm is the capital and largest city of Sweden. It is located on Sweden's south-central east coast, where Lake Mälaren meets the Baltic Sea. | -+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ -``` - - - -**Next Steps** - -Go to the [Use Cases](https://docs.mindsdb.com/use-cases/overview) section to see more examples. - diff --git a/docs/integrations/ai-engines/autokeras.mdx b/docs/integrations/ai-engines/autokeras.mdx deleted file mode 100644 index 05907f673df..00000000000 --- a/docs/integrations/ai-engines/autokeras.mdx +++ /dev/null @@ -1,67 +0,0 @@ ---- -title: AutoKeras -sidebarTitle: AutoKeras ---- - - -AutoKeras is an AutoML package for both regression and classification tasks. -We have integrated both the Regressor and Classifier models from this package. -Our implementation will automatically determine whether the task is regression or classification, the user does not need to specify this. - -Call this handler with -`USING ENGINE='autokeras'` - you can see a full example in https://github.com/mindsdb/mindsdb/pull/4559 - -## Why is this integration useful? - -AutoKeras will build an accurate deep learning model for the end-user, with no prior knowledge of deep learning required. -The handler will automatically search and tune different neural network architectures to find the most accurate for the given problem. -The ideal use case is a very large dataset with many predictive features, where the user doesn't have strong priors about how these features may affect the target variable. - -Do not use this integration for for small datasets, as neural networks are prone to overfitting. - -Do not use this integration if you need to fit a model very quickly, as training time can be long. - -## Are models created with this integration fast and scalable, in general? -Making predictions with these models are very fast. - -However, the model search and auto-training process is slow because AutoKeras uses neural networks. Using the default settings, model training may take several hours. We provide an optional setting to reduce training time (see below). - - -## What are the recommended system specifications for models created with this framework? -We recommend training this handler on a machine with a CUDA-enabled GPU. -We would also recommend training on a remote server, or a machine you can leave running, given potentially long training times. - -## To what degree can users control the underlying framework by passing parameters via the USING syntax? -We provide an optional argument, - -``` -USING - ENGINE='autokeras', - train_time={x} -``` - -where x can take values from 0 to 1 (default to 1). Lower values will reduce training time linearly e.g. a value of 0.1 will cut training time to 10% of the default. This comes at the cost of accuracy, as the neural net model search space is reduced by the same factor. - -## Does this integration offer model explainability or insights via the DESCRIBE syntax? -Not implemented yet. - -## Does this integration support fine-tuning pre-existing models (i.e. is the update() method implemented)? Are there any caveats? -Not implemented yet. - -## Are there any other noteworthy aspects to this handler? -AutoKeras automatically splits the data into training and validation sets, the user does not need to do this. -The original Keras library was for computer vision, so this would be a good option if users want to do image analysis. - -Users should set up a Conda environment to use this handler, as Keras depends on Tensorflow rather than PyTorch. -They should follow the instructions at: -1. https://www.tensorflow.org/install/pip -2. https://autokeras.com/install/ - -## Any directions for future work in subsequent versions of the handler? -Implement the DESCRIBE and UPDATE methods. - -If there is user demand for image analysis, implement the image models from AutoKeras. - -## Please provide a minimal SQL example that uses this ML engine (pointers to integration tests in the PR also valid) -See integration test in https://github.com/mindsdb/mindsdb/pull/4559 - diff --git a/docs/integrations/ai-engines/autosklearn.mdx b/docs/integrations/ai-engines/autosklearn.mdx deleted file mode 100644 index 83d51424128..00000000000 --- a/docs/integrations/ai-engines/autosklearn.mdx +++ /dev/null @@ -1,8 +0,0 @@ ---- -title: AutoSklearn -sidebarTitle: AutoSklearn ---- - - -This page is a work in progress. - diff --git a/docs/integrations/ai-engines/clipdrop.mdx b/docs/integrations/ai-engines/clipdrop.mdx deleted file mode 100644 index c0a174272fa..00000000000 --- a/docs/integrations/ai-engines/clipdrop.mdx +++ /dev/null @@ -1,235 +0,0 @@ ---- -title: Clipdrop -sidebarTitle: Clipdrop ---- - -Integrate state of the art image processing AI directly in your products. - - - -To use Clipdrop in MindsDB, you need to sign up for a Clipdrop account and obtain an API key. Learn more [here](https://clipdrop.co/apis). - - - -## Setup - -```sql -CREATE ML_ENGINE clipdrop_engine -FROM clipdrop -USING - clipdrop_api_key = 'your_api_key'; -``` - -## Usage - -### Remove Text from Image - -