Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/v1.4 #679

Merged
merged 25 commits into from
Oct 25, 2023
Merged

Release/v1.4 #679

merged 25 commits into from
Oct 25, 2023

Conversation

gventuri
Copy link
Collaborator

@gventuri gventuri commented Oct 23, 2023

Summary by CodeRabbit

New Features:

  • Added the ability to create custom functions and add them to the agent's skills, enhancing the software's flexibility and adaptability.
  • Introduced an advanced reasoning framework in PandasAI, which can be enabled or disabled by the user, improving the results of the Language Model for code generation.
  • Implemented a new skill to display a bar chart of employee salaries using the Streamlit library, enhancing data visualization capabilities.
  • Added a new setting to specify a logging server URL and API key, improving data tracking and logging.
  • Introduced a new SqliteConnector to connect to SQLite databases, expanding the range of supported data sources.

Improvements:

  • Enhanced the clarity of questions and instructions in the code and prompt text.
  • Improved code execution tracking and logging with the introduction of the QueryExecTracker class.

Bug Fixes:

  • Fixed issues with code execution context in the CodeManager class.

Tests:

  • Expanded test coverage for new features and improvements, ensuring the reliability of the software.

Documentation:

  • Updated documentation with examples and instructions for new features and improvements, enhancing user understanding and ease of use.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2023

Walkthrough

This comprehensive update enhances the PandasAI library with advanced reasoning capabilities, custom skills management, and SQLite database connectivity. It also improves code execution context, query execution tracking, and introduces new test cases for added functionalities. The changes are aimed at improving the user experience, expanding the library's capabilities, and ensuring robustness.

Changes

Files Summary
docs/*, examples/* Updated documentation and examples to reflect new features and changes.
pandasai/agent/*, pandasai/smart_dataframe/*, pandasai/smart_datalake/* Enhanced with advanced reasoning, custom skills management, and improved code execution context.
pandasai/connectors/* Introduced SQLite database connectivity.
pandasai/helpers/* Added query execution tracking and improved code execution context.
pandasai/llm/*, pandasai/prompts/* Updated to support advanced reasoning.
pandasai/skills/* Introduced custom skills management.
tests/* Added new test cases for the added functionalities.

🐇💻

Code is hopping, changes are popping,

With every commit, there's no stopping.

From SQLite to reasoning so advanced,

In the world of AI, we've just danced. 🎉🎈


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2023

Codecov Report

Merging #679 (43efd40) into main (fe45cca) will increase coverage by 0.96%.
The diff coverage is 93.83%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #679      +/-   ##
==========================================
+ Coverage   83.84%   84.80%   +0.96%     
==========================================
  Files          67       70       +3     
  Lines        3206     3547     +341     
==========================================
+ Hits         2688     3008     +320     
- Misses        518      539      +21     
Files Coverage Δ
pandasai/__init__.py 90.19% <100.00%> (+0.19%) ⬆️
pandasai/connectors/__init__.py 100.00% <100.00%> (ø)
pandasai/connectors/base.py 88.04% <100.00%> (+0.40%) ⬆️
pandasai/exceptions.py 91.66% <100.00%> (+0.36%) ⬆️
pandasai/helpers/code_manager.py 91.54% <100.00%> (+0.74%) ⬆️
pandasai/helpers/output_types/_output_types.py 88.76% <ø> (ø)
pandasai/helpers/skills_manager.py 100.00% <100.00%> (ø)
pandasai/llm/base.py 88.88% <100.00%> (+0.86%) ⬆️
pandasai/prompts/generate_python_code.py 96.55% <100.00%> (+1.31%) ⬆️
pandasai/schemas/df_config.py 100.00% <100.00%> (ø)
... and 7 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 23

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0d56e93 and e711c89.
Files ignored due to filter (1)
  • mkdocs.yml
Files selected for processing (29)
  • docs/examples.md (1 hunks)
  • docs/getting-started.md (1 hunks)
  • docs/skills.md (1 hunks)
  • examples/skills_example.py (1 hunks)
  • examples/using_pandasai_log_server.py (1 hunks)
  • pandasai/init.py (2 hunks)
  • pandasai/agent/init.py (6 hunks)
  • pandasai/assets/prompt_templates/advanced_reasoning.tmpl (1 hunks)
  • pandasai/assets/prompt_templates/check_if_relevant_to_conversation.tmpl (1 hunks)
  • pandasai/assets/prompt_templates/generate_python_code.tmpl (1 hunks)
  • pandasai/assets/prompt_templates/simple_reasoning.tmpl (1 hunks)
  • pandasai/exceptions.py (1 hunks)
  • pandasai/helpers/code_manager.py (8 hunks)
  • pandasai/helpers/query_exec_tracker.py (1 hunks)
  • pandasai/helpers/skills_manager.py (1 hunks)
  • pandasai/llm/base.py (3 hunks)
  • pandasai/prompts/base.py (4 hunks)
  • pandasai/prompts/generate_python_code.py (2 hunks)
  • pandasai/schemas/df_config.py (3 hunks)
  • pandasai/skills/init.py (1 hunks)
  • pandasai/smart_dataframe/init.py (4 hunks)
  • pandasai/smart_datalake/init.py (17 hunks)
  • tests/llms/test_base_llm.py (1 hunks)
  • tests/prompts/test_generate_python_code_prompt.py (5 hunks)
  • tests/skills/test_skills.py (1 hunks)
  • tests/test_codemanager.py (7 hunks)
  • tests/test_query_tracker.py (1 hunks)
  • tests/test_smartdataframe.py (5 hunks)
  • tests/test_smartdatalake.py (2 hunks)
Files skipped from review due to trivial changes (8)
  • docs/examples.md
  • pandasai/assets/prompt_templates/check_if_relevant_to_conversation.tmpl
  • pandasai/assets/prompt_templates/simple_reasoning.tmpl
  • pandasai/exceptions.py
  • pandasai/prompts/generate_python_code.py
  • tests/llms/test_base_llm.py
  • tests/test_query_tracker.py
  • tests/test_smartdataframe.py
Additional comments: 64
pandasai/assets/prompt_templates/generate_python_code.tmpl (1)
  • 9-17: The new hunk introduces additional instructions for the user, such as reasoning step-by-step, avoiding the use of technical table names, and considering the last message in the conversation. Ensure that these instructions are clear and understandable to the user. Also, verify that the {skills} and {reasoning} placeholders are correctly replaced with the intended content during runtime.
docs/getting-started.md (1)
  • 193-193: The new setting use_advanced_reasoning_framework has been introduced. Ensure that the implications of enabling this setting are clearly documented, especially in terms of performance and resource usage. Also, consider providing examples of scenarios where this setting would be beneficial.
pandasai/assets/prompt_templates/advanced_reasoning.tmpl (1)
  • 1-3: The changes in this hunk are related to the instructions for creating advanced reasoning prompts. The instructions are clear and concise, and they provide the necessary guidance for wrapping the reasoning and answer in the appropriate tags, as well as returning the updated analyze_data function. However, it would be beneficial to provide an example for better understanding.

Example:

For instance, if the last step was to calculate the mean of a dataset, your template might look like this:

<reasoning>
To find the average value of the dataset, we calculate the mean. This gives us a single value that represents the center of the dataset.
</reasoning>

<answer>
The mean of the dataset is calculated using the `mean()` function in pandas.
</answer>

```python
def analyze_data(data):
    mean = data.mean()
    return mean


</blockquote></details>
<details><summary>docs/skills.md (1)</summary><blockquote>

* 108-108: The function `plot_salaries_using_streamlit` is not defined in the provided code. It seems like a typo, and the function `plot_salaries` should be added as a skill instead. Please verify.
```diff
- agent.add_skills(plot_salaries_using_streamlit)
+ agent.add_skills(plot_salaries)
pandasai/__init__.py (1)
  • 261-268: The __all__ list has been updated to include skill. This means skill will be imported when from pandasai import * is used. Ensure that this is the intended behavior and that skill is designed to be used directly.
examples/using_pandasai_log_server.py (1)
  • 21-23: Environment variables are being set within the script. This is not a good practice as it can lead to security issues and it's not scalable. Environment variables should be set outside the script, in the environment where the script is running. If these lines are for demonstration purposes, consider adding a comment to clarify this.

26, 42:
The same API key is used for both the OpenAI instance and the logging server. If these services require different keys, this could be a potential issue. Please verify.

examples/skills_example.py (3)
  • 41-41: The Agent is initialized with a memory_size of 10. Ensure that this value is appropriate for your use case. If the Agent needs to remember more or fewer past interactions, adjust this value accordingly.

  • 43-43: The plot_salaries skill is added to the Agent. Ensure that this skill is compatible with the Agent and that it can be used effectively in the chat interface.

  • 46-47: The Agent is used to process a chat command. Ensure that the command is correctly interpreted by the Agent and that the response is as expected.

pandasai/helpers/skills_manager.py (1)
  • 98-99: The setter for used_skills allows for the entire _used_skills list to be replaced. This could potentially remove all previously used skills. If this is not the intended behavior, consider removing this setter or adding additional checks.
pandasai/prompts/base.py (4)
  • 14-14: The _config attribute has been added to the AbstractPrompt class. Ensure that this attribute is used appropriately in the subclasses and that it is initialized before use to avoid NoneType errors.

  • 31-32: The on_prompt_generation method has been added. This method seems to be a hook that can be overridden by subclasses to perform additional tasks during the prompt generation process. Ensure that this method is used correctly in the subclasses.

  • 65-74: The set_config and get_config methods have been added. These methods are used to set and retrieve the configuration for the prompt. The get_config method can retrieve the entire configuration or a specific key from the configuration. Ensure that these methods are used correctly in the subclasses and that the configuration is set before it is retrieved.

  • 90-101: The on_prompt_generation method is now called at the beginning of the to_string method. This allows subclasses to perform additional tasks during the prompt generation process. Also, the set_vars method is now called with an additional condition to exclude AbstractPrompt instances. This change ensures that nested prompts do not receive other nested prompts as arguments, which could potentially lead to infinite recursion. Ensure that these changes do not introduce any unexpected behavior in the subclasses.

pandasai/llm/base.py (3)
  • 123-142: The _extract_tag_text method uses regular expressions to extract text between specified tags from a response string. This method could potentially return None if no match is found. Ensure that the calling code can handle a None return value. Also, consider precompiling the regular expression for performance improvement if this method is called frequently.
-        match = re.search(
-            f"(<{tag}>)(.*)(</{tag}>)",
-            response,
-            re.DOTALL | re.MULTILINE,
-        )
+        pattern = re.compile(f"(<{tag}>)(.*)(</{tag}>)", re.DOTALL | re.MULTILINE)
+        match = pattern.search(response)
  • 168-173: The _extract_answer method splits the response into sentences and filters out any sentence containing "temp_chart.png". This seems like a very specific condition. Ensure that this condition is valid for all cases where this method will be used. If this is a temporary workaround or a condition specific to a certain use case, consider making this more generic or configurable.

  • 200-211: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [192-208]

The generate_code method's return type has been changed from a single string to a list of strings. This change will affect all places where this method is called. Ensure that all calls to this method throughout the codebase have been updated to handle the new return type. Also, the method now extracts and returns the code, reasoning, and answer from the response. Ensure that this is the intended behavior and that it doesn't break any existing functionality.

-    def generate_code(self, instruction: AbstractPrompt) -> str:
+    def generate_code(self, instruction: AbstractPrompt) -> [str, str, str]:
pandasai/agent/__init__.py (6)
  • 3-4: The import statement for skill is added but it's not clear from the context whether it's a module or a class. If it's a class, it should be capitalized according to Python's naming conventions. Please verify.

  • 46-57: The set_instance_type method is called on the SmartDatalake object to set the instance type. A new method add_skills is added to the Agent class to add skills to the SmartDatalake. This is a good practice as it allows for better modularity and separation of concerns.

  • 85-86: The check_if_related_to_conversation method now returns a boolean value indicating whether the query is related to the conversation. This value is then passed to the is_related_query method of the SmartDatalake object. This is a good practice as it makes the code more readable and understandable.

  • 96-96: The return type of the check_if_related_to_conversation method is now explicitly defined as bool. This is a good practice as it makes the code more readable and understandable.

  • 121-121: The check_if_related_to_conversation method now returns the value of related. This is a good practice as it makes the code more readable and understandable.

  • 189-207: New properties are added to the Agent class to get the last code generated, the last code executed, the last prompt, the last reasoning, and the last answer from the SmartDatalake object. This is a good practice as it allows for better encapsulation and data hiding.

pandasai/smart_dataframe/__init__.py (4)
  • 29-29: The import statement from pandasai.skills import skill is added. Ensure that the skill module is correctly implemented and contains all the necessary classes or functions that are used in this file. Also, verify that the skill module does not have any side effects upon import.

  • 306-307: The set_instance_type method is called on the SmartDatalake instance. Ensure that this method is correctly implemented in the SmartDatalake class and that it correctly handles the instance type. Also, verify that the instance type is used correctly in the SmartDatalake class.

  • 326-330: The add_skills method is added to the SmartDataframe class. This method calls the add_skills method on the SmartDatalake instance. Ensure that the add_skills method is correctly implemented in the SmartDatalake class and that it correctly handles the skills. Also, verify that the skills are used correctly in the SmartDatalake class.

  • 692-698: The last_reasoning and last_answer properties are added to the SmartDataframe class. These properties return the last_reasoning and last_answer attributes of the SmartDatalake instance, respectively. Ensure that these attributes are correctly set in the SmartDatalake class and that they correctly represent the last reasoning and answer.

tests/test_smartdatalake.py (1)
  • 194-211: This test case is checking if the last_answer and last_reasoning attributes are correctly set after a chat interaction. The test case seems to be well written and covers the necessary assertions. However, it would be beneficial to add a test case for when use_advanced_reasoning_framework is set to False to ensure that the last_answer and last_reasoning attributes behave as expected in that scenario as well.
pandasai/helpers/code_manager.py (8)
  • 9-9: The SkillsManager class is imported but not used in this hunk. Ensure that it is used in the code sections not included in this hunk, otherwise, it's an unnecessary import.

  • 28-50: The CodeExecutionContext class is introduced to provide additional context during code execution. It includes a prompt_id and a skills_manager. The class is well-structured and follows the best practices of encapsulation with the use of properties.

  • 208-217: The execute_code method now takes an instance of CodeExecutionContext as a parameter instead of prompt_id. This change provides more context during code execution. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 234-240: The execute_code method now uses the prompt_id from the context parameter. Also, it resets the used_skills in the skills_manager before executing the code. This is a good practice as it ensures that the used_skills are reset before each code execution.

  • 242-242: The _clean_code method now takes an instance of CodeExecutionContext as a parameter. This change provides more context during code cleaning. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 257-261: The execute_code method now adds the used skills to the environment before executing the code. This is a good practice as it ensures that the skills are available in the environment during code execution.

  • 411-411: The _clean_code method now takes an instance of CodeExecutionContext as a parameter. This change provides more context during code cleaning. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 435-450: The _clean_code method now checks for used skills inside the analyze_data function and adds them to the skills_manager. This is a good practice as it ensures that the used skills are tracked and can be reused in subsequent code executions.

tests/prompts/test_generate_python_code_prompt.py (6)
  • 54-54: The new variable "skills" is being set to an empty string. Ensure that this is the intended behavior and that the "skills" variable is used appropriately in the rest of the code.

  • 69-69: The instruction about not changing the params of the initial Python function has been updated to include a note about using the right dataframes given the context. This is a good addition as it provides more clarity to the user.

  • 79-79: The instruction for the "Analyze" step has been updated to specify that if the user asks to plot a chart, it must be saved as an image in "temp_chart.png" and not shown. This is a good change as it provides clear instructions to the user about how to handle chart requests.

  • 85-88: New instructions have been added to guide the user on how to approach the task and how to format their answer. These instructions provide additional clarity and guidance to the user, which can help improve the quality of the user's work.

  • 159-159: The instruction for the "Analyze" step in the custom instructions has been updated to match the change made in the default instructions. This ensures consistency between the default and custom instructions.

  • 170-170: The instruction for the "Analyze" step in the test for custom instructions has been updated to match the change made in the default instructions. This ensures that the test is checking the correct behavior.

tests/test_codemanager.py (3)
  • 13-13: The CodeExecutionContext class is now imported and used in the test suite. Ensure that the tests cover all the functionalities of this new class.

  • 74-77: A new fixture exec_context is introduced. It creates a mock object for CodeExecutionContext. This is a good practice as it allows for isolation of tests and makes them more reliable.

  • 190-200: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [182-197]

The exec_context fixture is now used as an argument in the test methods. This change is consistent with the introduction of the CodeExecutionContext class. Ensure that the CodeExecutionContext class is properly mocked and that the tests still cover all the functionalities of the CodeManager class.

pandasai/smart_datalake/__init__.py (16)
  • 20-28: The new imports SkillsManager, skill, and QueryExecTracker are introduced. Ensure these modules are correctly implemented and available in the project structure.

  • 44-48: The new imports CodeExecutionContext and AdvancedReasoningDisabledError are introduced. Ensure these modules are correctly implemented and available in the project structure.

  • 57-64: New instance variables _conversation_id, _skills, _instance, and _query_exec_tracker are introduced. Make sure they are used appropriately throughout the class.

  • 110-111: A SkillsManager instance is being created. Ensure that the SkillsManager class is correctly implemented and that its methods are used appropriately in the SmartDatalake class.

  • 124-130: The _conversation_id is being set to a new UUID, _instance is being set to the class name, and _query_exec_tracker is being initialized with the log server configuration. Ensure that these changes are compatible with the rest of the codebase.

  • 218-222: The add_skills method is introduced to add skills to the SkillsManager. Ensure that the add_skills method of the SkillsManager class is correctly implemented and that it handles the input correctly.

  • 257-266: The prompt is being set with the configuration, dataframes, conversation, and skills. Ensure that these changes are compatible with the rest of the codebase and that the prompt is used correctly.

  • 314-326: The QueryExecTracker is being used to start a new track, add query info, and add dataframes. Ensure that these methods are correctly implemented in the QueryExecTracker class and that they are used correctly in the SmartDatalake class.

  • 338-340: The QueryExecTracker is being used to execute the cache get function. Ensure that the execute_func method of the QueryExecTracker class is correctly implemented and that it handles the input correctly.

  • 356-367: The QueryExecTracker is being used to execute the get_prompt and generate_code methods. Ensure that these methods are correctly implemented and that they handle the input correctly.

  • 393-397: A CodeExecutionContext is being created and passed to the execute_code method. Ensure that the CodeExecutionContext class is correctly implemented and that it is used correctly in the SmartDatalake class.

  • 416-422: The QueryExecTracker is being used to execute the retry_run_code method. Ensure that this method is correctly implemented and that it handles the input correctly.

  • 450-475: The QueryExecTracker is being used to get the execution time, execute the parse method, set the success flag, and publish the results. Ensure that these methods are correctly implemented and that they handle the input correctly.

  • 514-531: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [492-521]

The _retry_run_code method now returns a list. Ensure that this change is compatible with the rest of the codebase.

  • 686-709: The last_reasoning and last_answer properties are introduced. They raise an AdvancedReasoningDisabledError if the advanced reasoning framework is not enabled. Ensure that this exception is handled correctly in the codebase.

  • 734-736: The instance property is introduced. Ensure that it is used correctly in the codebase.

pandasai/schemas/df_config.py (3)
  • 11-15: The new LogServerConfig class is a TypedDict that includes server_url and api_key. Ensure that the server URL and API key are being properly validated elsewhere in the code, as this TypedDict does not include any validation logic. Also, consider encrypting or securely storing the api_key as it is sensitive information.

  • 16-22: A new configuration option use_advanced_reasoning_framework has been added to the Config class. Ensure that this new configuration option is being used correctly throughout the codebase.

  • 35-35: The log_server attribute of type LogServerConfig has been added to the Config class. Ensure that this new attribute is being used correctly throughout the codebase and that it is being properly initialized.

Comment on lines +4 to +32
def skill(skill_function):
def wrapped_function(*args, **kwargs):
return skill_function(*args, **kwargs)

wrapped_function.name = skill_function.__name__
wrapped_function.func_def = (
"""def pandasai.skills.{funcion_name}{signature}""".format(
funcion_name=wrapped_function.name,
signature=str(inspect.signature(skill_function)),
)
)

doc_string = skill_function.__doc__

wrapped_function.print = (
"""
<function>
{signature}
{doc_string}
</function>
"""
).format(
signature=wrapped_function.func_def,
doc_string=""" \"\"\"{0}\n \"\"\"""".format(doc_string)
if doc_string is not None
else "",
)

return wrapped_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill function is a decorator that wraps a given function, preserving its name and signature, and providing a formatted string representation. However, there's a typo in the string format placeholder on line 10, which should be function_name instead of funcion_name. This could lead to an error when trying to format the string. Here's the corrected code:

- 10:         """def pandasai.skills.{funcion_name}{signature}""".format(
+ 10:         """def pandasai.skills.{function_name}{signature}""".format(

Also, the wrapped_function.print attribute seems to be a string representation of the function, including its signature and docstring. However, it's not clear why this is necessary, as Python already provides the inspect module for this purpose. If there's a specific reason for this, it might be worth documenting it in the code.

Comment on lines +39 to +46
import matplotlib.pyplot as plt

plt.bar(name, salaries)
plt.xlabel("Employee Name")
plt.ylabel("Salary")
plt.title("Employee Salaries")
plt.xticks(rotation=45)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matplotlib.pyplot import is inside the function plot_salaries. This could lead to performance issues if the function is called multiple times, as the import statement will be executed each time the function is called. It's a best practice to place all import statements at the top of the file. This makes it easier to see what modules are used in the script and avoids repeated imports.

+ import matplotlib.pyplot as plt
...
- import matplotlib.pyplot as plt

Comment on lines +93 to +100
import matplotlib.pyplot as plt

plt.bar(name, salary)
plt.xlabel("Employee Name")
plt.ylabel("Salary")
plt.title("Employee Salaries")
plt.xticks(rotation=45)
plt.savefig("temp_chart.png")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue as above is present here with the matplotlib.pyplot import inside the function plot_salaries. Move the import to the top of the file.

+ import matplotlib.pyplot as plt
...
- import matplotlib.pyplot as plt

Comment on lines +1 to +3
import os
import pandas as pd
from pandasai import Agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os and pd (pandas) modules are imported but not used in this code snippet. If they are not used elsewhere in the code, consider removing these imports to clean up the code.

- import os
- import pandas as pd

Comment on lines +44 to +54
[employees_df, salaries_df],
config={
"llm": llm,
"enable_cache": True,
"log_server": {
"server_url": "SERVER_URL",
"api_key": "YOUR_API_KEY",
},
},
memory_size=10,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Agent class is instantiated twice with the same parameters. This could be a potential redundancy. If the two instances are not used differently, consider reusing the same instance to adhere to the DRY (Don't Repeat Yourself) principle.

37, 57:
The same chat query is being made twice. If the responses are expected to be the same, consider removing one of the queries to avoid redundancy.

@@ -147,6 +169,7 @@ def test_exception_handling(
code_manager.execute_code = Mock(
side_effect=NoCodeFoundError("No code found in the answer.")
)
code_manager.execute_code.__name__ = "execute_code"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __name__ attribute of the execute_code mock object is set to "execute_code". This seems unnecessary as the __name__ attribute of a Mock object is automatically set to the name of the method it's mocking. If there's a specific reason for this, it's fine. Otherwise, consider removing it.

- code_manager.execute_code.__name__ = "execute_code"

Comment on lines +34 to +43
def __init__(
self,
server_config: Union[dict, None] = None,
) -> None:
self._success = False
self._start_time = None
self._server_config = server_config
self._query_info = {}
self._is_related_query = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QueryExecTracker class is initialized with a server_config parameter which is optional and can be None. However, there is no check to ensure that server_config is a dictionary when it is not None. This could lead to unexpected behavior if a non-dictionary object is passed. Consider adding a type check for server_config in the constructor.

    def __init__(
        self,
        server_config: Union[dict, None] = None,
    ) -> None:
        self._success = False
        self._start_time = None
+        if server_config is not None and not isinstance(server_config, dict):
+            raise ValueError("server_config must be a dictionary or None")
        self._server_config = server_config
        self._query_info = {}
        self._is_related_query = True

Comment on lines +103 to +142
def execute_func(self, function, *args, **kwargs) -> Any:
"""
Tracks function executions, calculates execution time and prepare data
Args:
function (function): Function that is to be executed

Returns:
Any: Response return after function execution
"""
start_time = time.time()

# Get the tag from kwargs if provided, or use the function name as the default
tag = kwargs.pop("tag", function.__name__)

try:
result = function(*args, **kwargs)

execution_time = time.time() - start_time
if tag not in exec_steps:
return result

step_data = self._generate_exec_step(tag, result)

step_data["success"] = True
step_data["execution_time"] = execution_time

self._steps.append(step_data)

return result

except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[tag],
"success": False,
"execution_time": execution_time,
}
)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execute_func method is designed to execute a function and track its execution time. However, it also modifies the function's kwargs by popping the "tag" key if it exists. This could lead to unexpected behavior if the function relies on this key. Consider copying the kwargs before modifying them.

    def execute_func(self, function, *args, **kwargs) -> Any:
        """
        Tracks function executions, calculates execution time and prepare data
        Args:
            function (function): Function that is to be executed

        Returns:
            Any: Response return after function execution
        """
        start_time = time.time()

        # Get the tag from kwargs if provided, or use the function name as the default
+        kwargs_copy = kwargs.copy()
-        tag = kwargs.pop("tag", function.__name__)
+        tag = kwargs_copy.pop("tag", function.__name__)

        try:
-            result = function(*args, **kwargs)
+            result = function(*args, **kwargs_copy)

Comment on lines +249 to +262
if self._server_config is None:
server_url = os.environ.get("LOGGING_SERVER_URL")
api_key = os.environ.get("LOGGING_SERVER_API_KEY")
else:
server_url = self._server_config.get(
"server_url", os.environ.get("LOGGING_SERVER_URL")
)
api_key = self._server_config.get(
"api_key", os.environ.get("LOGGING_SERVER_API_KEY")
)

if api_key is None or server_url is None:
return

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The publish method retrieves the server_url and api_key from the server_config dictionary or from environment variables. However, it does not validate these values before attempting to send a request. If either of these values is None or an empty string, the request will fail. Consider adding validation for these values before sending the request.

        if self._server_config is None:
            server_url = os.environ.get("LOGGING_SERVER_URL")
            api_key = os.environ.get("LOGGING_SERVER_API_KEY")
        else:
            server_url = self._server_config.get(
                "server_url", os.environ.get("LOGGING_SERVER_URL")
            )
            api_key = self._server_config.get(
                "api_key", os.environ.get("LOGGING_SERVER_API_KEY")
            )

+        if not server_url or not api_key:
+            raise ValueError("server_url and api_key must be provided")
        if api_key is None or server_url is None:
            return

Comment on lines +185 to +218
def _format_response(self, result: ResponseType) -> ResponseType:
"""
Format output response
Args:
result (ResponseType): response returned after execution

Returns:
ResponseType: formatted response output
"""
formatted_result = {}
if result["type"] == "dataframe":
formatted_result = {
"type": result["type"],
"value": {
"headers": result["value"].columns.tolist(),
"rows": result["value"].values.tolist(),
},
}
return formatted_result
elif result["type"] == "plot":
with open(result["value"], "rb") as image_file:
image_data = image_file.read()
# Encode the image data to Base64
base64_image = (
f"data:image/png;base64,{base64.b64encode(image_data).decode()}"
)
formatted_result = {
"type": result["type"],
"value": base64_image,
}

return formatted_result
else:
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _format_response method handles different types of responses, but it does not handle unexpected types. If the result dictionary contains a type that is not "dataframe" or "plot", the method simply returns the original result. This could lead to inconsistent return types. Consider adding a default case to handle unexpected types.

    def _format_response(self, result: ResponseType) -> ResponseType:
        """
        Format output response
        Args:
            result (ResponseType): response returned after execution

        Returns:
            ResponseType: formatted response output
        """
        formatted_result = {}
        if result["type"] == "dataframe":
            formatted_result = {
                "type": result["type"],
                "value": {
                    "headers": result["value"].columns.tolist(),
                    "rows": result["value"].values.tolist(),
                },
            }
            return formatted_result
        elif result["type"] == "plot":
            with open(result["value"], "rb") as image_file:
                image_data = image_file.read()
            # Encode the image data to Base64
            base64_image = (
                f"data:image/png;base64,{base64.b64encode(image_data).decode()}"
            )
            formatted_result = {
                "type": result["type"],
                "value": base64_image,
            }

            return formatted_result
+        else:
+            raise ValueError(f"Unexpected result type: {result['type']}")
        else:
            return result

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between e711c89 and 0fd8d8c.
Files ignored due to filter (1)
  • mkdocs.yml
Files selected for processing (2)
  • pandasai/helpers/code_manager.py (8 hunks)
  • pandasai/smart_dataframe/init.py (4 hunks)
Additional comments: 12
pandasai/smart_dataframe/__init__.py (4)
  • 29-29: The import of skill from pandasai.skills is new. Ensure that the skill module and its usage are properly tested and documented. Also, verify that the skill module does not introduce any circular dependencies.

  • 306-307: The set_instance_type method is being called on the SmartDatalake instance. Ensure that this method exists in the SmartDatalake class and that it properly handles the instance type. Also, verify that the instance type is used appropriately within the SmartDatalake class.

  • 326-330: The add_skills method is a new addition to the SmartDataframe class. Ensure that this method is properly tested and that the skills are correctly added to the SmartDatalake instance. Also, verify that the skills are used appropriately within the SmartDatalake class.

  • 695-701: The last_reasoning and last_answer properties are new additions to the SmartDataframe class. Ensure that these properties are properly tested and that they correctly retrieve the last reasoning and answer from the SmartDatalake instance. Also, verify that these properties are used appropriately within the SmartDataframe class.

pandasai/helpers/code_manager.py (8)
  • 9-9: The SkillsManager class is imported but not used in this hunk. Ensure that it is used in the code sections not included in this hunk.

  • 28-50: The CodeExecutionContext class is introduced to provide additional context for code execution. It includes a prompt_id and a skills_manager. The class is well-structured and follows best practices.

  • 208-217: The execute_code method now takes a context parameter of type CodeExecutionContext instead of prompt_id. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

  • 234-240: The file_name for add_save_chart is now derived from context.prompt_id instead of prompt_id. Also, the used_skills of the skills_manager in the context is reset before the code is cleaned and executed. This is a good practice as it ensures that the skills used are tracked accurately for each execution context.

  • 242-242: The _clean_code method now takes a context parameter of type CodeExecutionContext. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

  • 257-262: The environment for code execution now includes the skills used in the current context. This is a good practice as it allows the executed code to use the skills defined in the context.

  • 411-411: The _clean_code method now takes a context parameter of type CodeExecutionContext. Ensure that all calls to this method throughout the codebase have been updated to match the new signature.

  • 435-451: The _clean_code method now checks for used skills inside the analyze_data function. This is a good practice as it allows the method to accurately track the skills used in the code. However, the add_used_skill method could potentially add a function name that is not a skill. Consider checking if the function name is a skill before adding it to used_skills.

- context.skills_manager.add_used_skill(function_name)
+ if context.skills_manager.is_skill(function_name):
+     context.skills_manager.add_used_skill(function_name)

Tanmaypatil123 and others added 2 commits October 25, 2023 00:30
* Added support for sqlite connectors

* Removed linting error

* Added tests for sqlite connector

* chore: style the code

---------

Co-authored-by: Gabriele Venturi <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0fd8d8c and 5569df0.
Files selected for processing (6)
  • docs/connectors.md (1 hunks)
  • examples/from_sql.py (2 hunks)
  • pandasai/connectors/init.py (2 hunks)
  • pandasai/connectors/base.py (1 hunks)
  • pandasai/connectors/sql.py (2 hunks)
  • tests/connectors/test_sqlite.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • pandasai/connectors/init.py
  • tests/connectors/test_sqlite.py
Additional comments: 5
pandasai/connectors/base.py (1)
  • 42-49: The new SqliteConnectorConfig class extends SQLBaseConnectorConfig and adds two new attributes: table and database. This class will be used to configure the connection to SQLite databases. Ensure that the table and database attributes are correctly used in the SqliteConnector class and that the user is informed about these new configuration parameters in the documentation.
examples/from_sql.py (2)
  • 5-5: The import of SqliteConnector is added to the existing imports. Ensure that the SqliteConnector class is implemented correctly and is compatible with the existing codebase.

  • 51-53: The invoice_connector is added to the SmartDatalake instance. Ensure that the SmartDatalake class can handle multiple connectors and that the SqliteConnector is compatible with it.

-        "database": "local_path_to_db",
+        "database": os.getenv("SQLITE_DB_PATH", "local_path_to_db"),
Committable suggestion (Beta)
# With a Sqlite database
invoice_connector = SqliteConnector(
    config={
        "database": os.getenv("SQLITE_DB_PATH", "local_path_to_db"),
        "table": "invoices",
        "where": [["status", "=", "pending"]],
    }
)
pandasai/connectors/sql.py (2)
  • 8-8: The import statement has been updated to include SqliteConnectorConfig. This is a new configuration class for the SqliteConnector class. Ensure that this class is defined and correctly handles the configuration for SQLite connections.

  • 367-450: A new SqliteConnector class has been added. This class inherits from the SQLConnector class and is used to connect to SQLite databases. The class includes methods for initializing the connector, loading the configuration, initializing the database connection, closing the connection, getting the head of the data source, and returning the string representation of the connector.

The __init__ method has been updated to set the dialect to "sqlite" and populate the configuration from environment variables if the configuration is a dictionary. The _load_connector_config method uses the SqliteConnectorConfig class to load the configuration. The _init_connection method creates a SQLAlchemy engine and connection for the SQLite database. The head method runs a SQL query to get the column names and 5 random rows from the data source. The __repr__ method returns a string representation of the connector.

Ensure that the SqliteConnectorConfig class correctly handles the configuration for SQLite connections and that the SqliteConnector class correctly connects to SQLite databases and performs the necessary operations.

Comment on lines +41 to +49
# With a Sqlite databse

invoice_connector = SqliteConnector(
config={
"database": "local_path_to_db",
"table": "invoices",
"where": [["status", "=", "pending"]],
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new SqliteConnector instance invoice_connector is created. The database path is hardcoded as "local_path_to_db". This might not be the correct path for all environments. Consider making this a configurable parameter.

Comment on lines +92 to +112
### Sqlite connector

Similarly to the PostgreSQL and MySQL connectors, the Sqlite connector allows you to connect to a local Sqlite database file. It is designed to be easy to use, even if you are not familiar with Sqlite or with PandasAI.

To use the Sqlite connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object:

```python
from pandasai.connectors import SqliteConnector

connector = SqliteConnector(config={
"database" : "PATH_TO_DB",
"table" : "actor",
"where" :[
["first_name","=","PENELOPE"]
]
})

df = SmartDataframe(connector)
df.chat('How many records are there ?')
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Sqlite connector is a great addition to the library. However, it's important to note that the "database" parameter in the configuration expects a file path to the SQLite database. This could potentially lead to file path issues if not handled correctly. Ensure that the file path is always valid and the file exists before trying to connect to the database. Also, consider adding error handling for invalid file paths or inaccessible files.

connector = SqliteConnector(config={
    "database" : "PATH_TO_DB",
    "table" : "actor",
    "where" :[
        ["first_name","=","PENELOPE"]
    ]
})
Committable suggestion (Beta)
Suggested change
### Sqlite connector
Similarly to the PostgreSQL and MySQL connectors, the Sqlite connector allows you to connect to a local Sqlite database file. It is designed to be easy to use, even if you are not familiar with Sqlite or with PandasAI.
To use the Sqlite connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object:
```python
from pandasai.connectors import SqliteConnector
connector = SqliteConnector(config={
"database" : "PATH_TO_DB",
"table" : "actor",
"where" :[
["first_name","=","PENELOPE"]
]
})
df = SmartDataframe(connector)
df.chat('How many records are there ?')
```
import os
database_path = "PATH_TO_DB"
if not os.path.exists(database_path):
raise FileNotFoundError(f"The database file {database_path} does not exist.")
connector = SqliteConnector(config={
"database" : database_path,
"table" : "actor",
"where" :[
["first_name","=","PENELOPE"]
]
})

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5569df0 and 43efd40.
Files selected for processing (2)
  • pandasai/helpers/output_types/_output_types.py (1 hunks)
  • tests/test_smartdataframe.py (5 hunks)
Files skipped from review due to trivial changes (1)
  • pandasai/helpers/output_types/_output_types.py
Additional comments: 3
tests/test_smartdataframe.py (3)
  • 228-233: The instructions for the analyze_data function have been updated to specify that if the user asks to plot a chart, it should be saved as an image in temp_chart.png and not displayed. This is a good change as it ensures that the function behaves consistently across different environments (e.g., a headless server where displaying a chart might not be possible). However, it would be helpful to specify where temp_chart.png should be saved (e.g., in the current directory, in a specific directory, etc.).

  • 243-246: The instructions have been updated to specify that the "technical" names of the tables should not be written in the answer. This is a good change as it encourages the use of user-friendly names in the output, which can improve the user experience. However, it would be helpful to provide some examples of what constitutes a "technical" name and what a suitable replacement might be.

  • 295-298: The instructions have been updated to specify that the updated analyze_data function should be returned wrapped within python . This is a good change as it ensures that the returned function is properly formatted as Python code. However, it would be helpful to provide an example of what this should look like.

@@ -215,7 +215,7 @@ def test_run_with_privacy_enforcement(self, llm):
User: How many countries are in the dataframe?
</conversation>

This is the initial python function. Do not change the params.
This is the initial python function. Do not change the params. Given the context, use the right dataframes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "This is the initial python function. Do not change the params. Given the context, use the right dataframes." is a bit vague. It would be helpful to specify which function this comment is referring to and what parameters should not be changed. Also, it would be beneficial to clarify what is meant by "use the right dataframes" - does this mean to select the appropriate dataframe from a set of dataframes based on the context? If so, what criteria should be used to select the appropriate dataframe?

@@ -275,7 +276,7 @@ def test_run_passing_output_type(self, llm, output_type, output_type_hint):
User: How many countries are in the dataframe?
</conversation>

This is the initial python function. Do not change the params.
This is the initial python function. Do not change the params. Given the context, use the right dataframes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "This is the initial python function. Do not change the params. Given the context, use the right dataframes." is repeated here. As mentioned earlier, it would be helpful to clarify this comment.

Comment on lines +289 to 293
3. Analyze: Conducting the actual analysis (if the user asks to plot a chart you must save it as an image in temp_chart.png and not show the chart.)
At the end, return a dictionary of:
{output_type_hint}
"""
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions for the analyze_data function are repeated here. It would be helpful to avoid repetition and only include these instructions once, perhaps at the start of the test suite.

@gventuri gventuri merged commit aac1e3e into main Oct 25, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants