Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting the Trained data from the Pandas Agent #1401

Open
Subhashchandra-P opened this issue Oct 21, 2024 · 28 comments
Open

Deleting the Trained data from the Pandas Agent #1401

Subhashchandra-P opened this issue Oct 21, 2024 · 28 comments
Labels
bug Something isn't working

Comments

@Subhashchandra-P
Copy link

System Info

OS Version : windows 11
Python Version : 3.11.9
Pandas ai version : 2.2.14

🐛 Describe the bug

Thanks for the Great work!
As a part of training the agent (QA Training) ,started training on some questions...Now, I want to remove the trained data from the agent..how can I do that?
@dosubot

Copy link

dosubot bot commented Oct 21, 2024

Hey @Subhashchandra-P! 👋 I'm here to help you with any issues or questions you have. Just ask away, and I'll do my best to assist you! Dosu

To delete trained data from the PandasAI agent, you can use the methods provided for managing the vector store. Specifically, you can use the delete_question_and_answers and delete_docs methods to remove data by vector ID. These methods are part of the vector store functionality in PandasAI. Here's an example of how you might use these methods:

# Assuming you have access to the vector store instance
vector_store.delete_question_and_answers(ids=["your_vector_id"])
vector_store.delete_docs(ids=["your_vector_id"])

These methods will delete the specified data from the vector store, effectively removing the trained data from the agent [1][2].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the bug Something isn't working label Oct 21, 2024
@Subhashchandra-P
Copy link
Author

Subhashchandra-P commented Oct 21, 2024

I have used the default vectorstore (i.e. Bamboo vector store), in that case , how can we delete the QA pairs from it?
@dosu

Copy link

dosubot bot commented Oct 21, 2024

To delete QA pairs from the default Bamboo vector store in PandasAI, you can use the delete_question_and_answers method of the Milvus class. This method requires a list of question-answer IDs that you want to delete. Here's an example of how to use it:

# Assuming `milvus_instance` is an instance of the Milvus class
ids_to_delete = ['id1', 'id2', 'id3']  # Replace with the actual IDs you want to delete
milvus_instance.delete_question_and_answers(ids=ids_to_delete)

This will remove the specified question-answer pairs from the QA collection [1].

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

Ok, can you give me the code to use Pinecone vector store for storing the trained data. using the Open AI embeddings.
@dosu

Copy link

dosubot bot commented Oct 21, 2024

To use the Pinecone vector store for storing trained data with OpenAI embeddings, you can follow these steps:

  1. Initialize the Pinecone Vector Store: Create an instance of the Pinecone class by providing the necessary parameters such as api_key, index, and an embedding_function that generates embeddings using OpenAI.

    from your_embedding_module import your_openai_embedding_function
    
    pinecone_store = Pinecone(
        api_key="your_pinecone_api_key",
        embedding_function=your_openai_embedding_function
    )
  2. Add Documents: Use the add_docs method to add your documents to the Pinecone vector store. This method will use the provided embedding_function to generate embeddings for the documents and store them in the vector store.

    documents = ["Document 1 text", "Document 2 text"]
    pinecone_store.add_docs(docs=documents)
  3. Add Question and Answer Pairs: If you have question and answer pairs, use the add_question_answer method to store them. This method also uses the embedding_function to generate embeddings for the combined question and answer text.

    questions = ["What is AI?", "Explain machine learning."]
    answers = ["AI is the simulation of human intelligence in machines.", "Machine learning is a subset of AI."]
    pinecone_store.add_question_answer(queries=questions, codes=answers)

Make sure to replace your_openai_embedding_function with the actual function that generates embeddings using OpenAI's API [1].

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

If we re initiate the agent, does the trained data on it gets deleted?
Explain the ways to delete the trained data and bring the agent to initial state.
@dosu

Copy link

dosubot bot commented Oct 21, 2024

Reinitializing the PandasAI agent does not automatically delete the trained data. The trained data is persisted, so you only need to train the model once unless you explicitly delete it [1].

To delete the trained data and bring the agent to its initial state, you can use the following methods:

  1. Delete Trained Data: If you're using the Bamboo vector store, there isn't specific information available on methods for deleting data. However, for other vector stores like Pinecone or Milvus, you can use methods like delete_question_and_answers to remove specific data by their IDs.

  2. Reset Agent State: Use the start_new_conversation method to reset the agent's state. This method clears the memory and resets the conversation ID, effectively bringing the agent to its initial state without affecting the trained data in the vector store.

agent.start_new_conversation()

This method will clear the previous conversation context but won't delete the trained data from the vector store [2].

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

what is the API key of the default Bamboo Vector Store?
Reset Agent State: Use the start_new_conversation method to reset the agent's state. This method clears the memory and resets the conversation ID, effectively bringing the agent to its initial state without affecting the trained data in the vector store.
I want to reset along with erasing the training data...does initialising the agent to new variable will delete it?
@dosu

Copy link

dosubot bot commented Oct 21, 2024

The default Bamboo vector store does not have a predefined API key in the codebase, or at least it is not explicitly mentioned in the available information.

Reinitializing the agent by creating a new instance does not delete the trained data stored in the vector store. The data persists unless you explicitly delete it using methods like delete_question_and_answers or delete_docs if they are implemented for your vector store. To reset the agent and erase the training data, you need to manually delete the data from the vector store using these methods.

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

How can I retrieve the Bamboo Vector Store Id with only the API?
@dosu

Copy link

dosubot bot commented Oct 21, 2024

There isn't specific information available on how to retrieve the Bamboo Vector Store ID using only the API. The documentation or codebase does not provide details on methods or endpoints for retrieving vector store IDs from the Bamboo vector store. If you have access to the API, you might need to explore the available endpoints or consult additional resources to find a way to retrieve the vector store ID.

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

can you give me the code to use Pinecone vector store for storing the trained data. using the Open AI embeddings. and store it with the index name as well as the namespace name.
@dosu

Copy link

dosubot bot commented Oct 22, 2024

To use the Pinecone vector store for storing trained data with OpenAI embeddings, specifying both the index name and the namespace name, follow these steps:

  1. Initialize the Pinecone Vector Store: Create an instance of the Pinecone class with the necessary parameters, including api_key, index, and an embedding_function that generates embeddings using OpenAI.

    from your_embedding_module import your_openai_embedding_function
    
    pinecone_store = Pinecone(
        api_key="your_pinecone_api_key",
        index="your_index_name",
        embedding_function=your_openai_embedding_function
    )
  2. Add Data to the Vector Store: Use the add_question_answer or add_docs method to add your data, specifying the namespace parameter.

    # For question and answer pairs
    questions = ["What is AI?", "Explain machine learning."]
    answers = ["AI is the simulation of human intelligence.", "Machine learning is a subset of AI."]
    pinecone_store.add_question_answer(
        queries=questions,
        codes=answers,
        namespace="your_namespace_name"
    )
    
    # For documents
    documents = ["Document 1 content", "Document 2 content"]
    pinecone_store.add_docs(
        docs=documents,
        namespace="your_namespace_name"
    )

Replace "your_pinecone_api_key", "your_index_name", "your_namespace_name", and your_openai_embedding_function with your actual API key, desired index name, namespace name, and the function that generates embeddings, respectively [1].

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

From which module, do we need to import the Pinecone?
It is giving me the
ModuleNotFoundError: No module named 'chromadb'
@dosu

Copy link

dosubot bot commented Oct 22, 2024

To use Pinecone, you should import the pinecone-client module. To resolve the "ModuleNotFoundError: No module named 'chromadb'" error, you need to install the chromadb package. You can do this by running the following command:

pip install chromadb

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

So, after installing ChromaDB and qdrant_client, I am still getting the error
{
"name": "ImportError",
"message": "cannot import name 'Pinecone' from 'pandasai.ee.vectorstores' (C:\Users\subhashchandra.p\AppData\Roaming\Python\Python311\site-packages\pandasai\ee\vectorstores\init.py)",
"stack": "---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[7], line 1
----> 1 from pandasai.ee.vectorstores import Pinecone

ImportError: cannot import name 'Pinecone' from 'pandasai.ee.vectorstores' (C:\Users\subhashchandra.p\AppData\Roaming\Python\Python311\site-packages\pandasai\ee\vectorstores\init.py)"
}
@dosu

Copy link

dosubot bot commented Oct 22, 2024

The error you're encountering suggests that the Pinecone class is not being correctly imported from the pandasai.ee.vectorstores module. This could be due to a few reasons:

  1. Check Installation: Ensure that you have the correct version of the pandas-ai package installed that includes the Pinecone implementation. You might need to update the package to the latest version.

  2. Verify Module Structure: Double-check the __init__.py file in the pandasai/ee/vectorstores directory to ensure that the Pinecone class is properly imported there. The Pinecone class should be defined in pandasai/ee/vectorstores/pinecone.py [1].

  3. Direct Import: Try importing the Pinecone class directly from its file path:

    from pandasai.ee.vectorstores.pinecone import Pinecone

If these steps do not resolve the issue, there might be a problem with the package installation or the specific version you are using.

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

Does the trained data of the agent changes, when we use the other pandas AI API key?
@dosu

Copy link

dosubot bot commented Oct 23, 2024

There isn't specific information available on whether the trained data of the PandasAI agent changes when using a different API key. The documentation does not provide details on how trained data is managed or isolated based on different API keys. Therefore, it's unclear if switching API keys affects the trained data or if there is a mechanism to segregate data by API key.

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

while running tha agent after being trained with question answer pairs...it is still not taking the trained data why?
is the trained data expires after a certain period or or they get overwritten?
Querying without using training data.
Querying without using training docs.
@dosu

Copy link

dosubot bot commented Oct 29, 2024

The PandasAI agent might not use the trained data during a query due to a few reasons:

  1. Missing Vector Store: If no vector store is provided, the agent cannot be trained on the data. Ensure that your vector store is correctly set up and that your API key is configured properly [1].

  2. Incomplete Training Data: If either queries or codes are provided without the other, the training cannot proceed. Make sure both are provided together during training [1].

Regarding data expiration or overwriting, there isn't specific information available on whether the trained data expires or gets automatically overwritten after a certain period [2]. If you're experiencing issues with the trained data not being used, it might be helpful to check the logs or use debugging tools provided by PandasAI to diagnose the problem [3].

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

what are the debugging tools provided by PandasAI to monitor the training data and the agent activities?
@dosu

Copy link

dosubot bot commented Oct 29, 2024

PandasAI provides a Logger class that can be configured for detailed logging or debugging. You can enable detailed logging by setting the verbose parameter to True when initializing the Logger class. This will log messages to the console. Additionally, you can save logs to a file by setting the save_logs parameter to True [1].

Here is an example of how to configure the logger:

from pandasai.helpers.logger import Logger

# Enable verbose logging to console and save logs to a file
logger = Logger(save_logs=True, verbose=True)
logger.log("This is a detailed log message.")

This configuration will log messages both to the console and to a file named pandasai.log.

Additionally, PandasAI allows for logging to a server by configuring the log_server settings in the Agent configuration. You can specify a server_url and an api_key for the log server [2].

These logging capabilities can help you monitor the training data and agent activities effectively.

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

give me all the information you know about the bamboo vector store in pandas ai agents?
@dosu

Copy link

dosubot bot commented Oct 30, 2024

The Bamboo vector store in PandasAI agents is implemented as a class named BambooVectorStore, which is a subclass of VectorStore. It provides methods to interact with a vector store, specifically for adding and retrieving question-answer pairs and documents. Here are the key functionalities:

  • Initialization: The constructor allows setting an endpoint URL, API key, logger, and maximum number of samples. If the BambooVectorStore is not explicitly provided during the agent's initialization, it attempts to import and instantiate it using the logger if the PANDASAI_API_KEY environment variable is set.

  • Adding Data:

    • add_question_answer method allows adding question and answer pairs to the training set. This method makes a POST request to the /training-data endpoint with the questions and answers in JSON format.
    • add_docs method allows adding documents to the training set. This method makes a POST request to the /training-docs endpoint with the documents in JSON format.
  • Retrieving Data:

    • get_relevant_qa_documents method retrieves relevant question-answer documents based on a search query. It makes a GET request to the /training-data/qa/relevant-qa endpoint with the query and count as parameters.
    • get_relevant_docs_documents method retrieves relevant documents based on a search query. It makes a GET request to the /training-docs/docs/relevant-docs endpoint with the query and count as parameters.

These methods utilize a session to make HTTP requests to specific endpoints for managing the training data. Additionally, the BambooVectorStore is integrated into the BaseAgent class, which manages the conversational experience in PandasAI, allowing it to train on data and manage memory effectively. The unit tests for these methods ensure that the correct HTTP requests are made and handle exceptions by returning an empty list if an error occurs [1][2][3].

To continue talking to Dosu, mention @dosu.

@Subhashchandra-P
Copy link
Author

How to print all the trained qa pairs of the agent..can we use agent.get_relevant_qa_documents?

@Subhashchandra-P
Copy link
Author

How to print all the trained qa pairs of the agent..can we use agent.get_relevant_qa_documents?
@dosu

Copy link

dosubot bot commented Oct 30, 2024

To print all the trained question-answer pairs of the agent, you cannot directly use the get_relevant_qa_documents method, as it is designed to retrieve only relevant documents based on a specific query. Unfortunately, there isn't a method in the BambooVectorStore class that retrieves all stored question-answer pairs without filtering by relevance [1].

If you need to access all the stored data, you would typically need a method that retrieves all entries without a query filter. Since such a method isn't available, you might need to implement a custom solution or check if the vector store provides any alternative ways to access all stored data.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant